多模态可解释性模型预测肺癌新辅助治疗后完全病理缓解的价值：一项多中心研究

刘淇; 刘俊英; 张培; 魏娉; 刘月平

doi:10.12290/xhyxzz.2026-0398

多模态可解释性模型预测肺癌新辅助治疗后完全病理缓解的价值：一项多中心研究

Multimodal Interpretable Model for Predicting Complete Pathological Response After Neoadjuvant Therapy in Lung Cancer: a Multicenter Study

摘要

摘要: 目的构建融合全切片病理图像（whole slide image，WSI）与临床特征的多模态可解释性模型，并验证其预测肺癌患者新辅助治疗后的完全病理缓解（complete pathologic response，CPR）的效能。方法回顾性收集2015年3月—2025年3月河北医科大学第四医院、河北大学附属医院和邯郸市第一医院接受新辅助治疗的肺癌患者及其临床病理资料及苏木精-伊红染色切片。在WSI分支，基于单分支聚类约束注意力多实例学习（clustering-constrained attention multiple instance learning-single branch，CLAM-SB）框架比较5个病理基础模型（CTransPath、Virchow2、H-optimus-0、Phikon-V2、UNI-V2）的预测性能，以筛选最优特征提取器；基于上述图像特征，构建临床分支极端梯度提升（extreme gradient boosting，XGBoost）模型；进一步通过决策层Logistic 回归融合建立多模态预测CPR模型（multimodal prediction for complete pathologic response，MP-CPR）。采用曲线下面积（area under the curve，AUC）、准确率、灵敏度、特异度、F1分数及Brier分数评价模型性能，并结合注意力热图与夏普利加性解释（Shapley additive explanations，SHAP）算法进行可解释性分析。结果共728例患者纳入本研究。其中，河北医科大学第四医院676例作为内部队列，按照8∶1∶1比例随机划分为训练集536例、验证集70例及内部测试集70例；河北大学附属医院32例和邯郸市第一医院20例分别作为外部验证集1和外部验证集2。在内部测试集中，UNI-V2为WSI分支中表现最佳的特征提取器，AUC为0.774（95% CI：0.688～0.861）。基于此构建的MP-CPR多模态模型在内部测试集中的AUC为0.812（95% CI：0.725～0.899），高于WSI分支模型（0.774）和临床分支模型（0.780），其准确率为81.8%，灵敏度为70.6%，特异度为86.5%，Brier分数0.125。在外部验证集1中，MP-CPR模型AUC为0.746（95% CI：0.598～0.898），准确率为75.0%；外部验证集2中，AUC为0.722（95%CI：0.538～0.918），准确率为80.0%。注意力热图显示，模型关注区域主要集中于肿瘤实质及疗效相关周边区域；SHAP分析表明，术前治疗方案、病理诊断、肿瘤-间质比例、年龄及吸烟史是对CPR预测贡献度最高的5个变量。结论本研究基于多中心数据构建的MP-CPR模型在总体预测肺癌CPR方面优于单一模态模型，且可解释性结果与临床病理诊断具有较好的一致性，有望为临床评估治疗反应及优化个体化决策提供潜在辅助参考。

Abstract: Objective To develop an interpretable multimodal model that integrates whole-slide images(WSI) and clinical features, and to validate its efficacy in predicting complete pathological response (CPR) in lung cancer patients following neoadjuvant therapy. Methods The patients and their clinicopathologic data as well as Hematoxylin and Eosin stained sections were retrospectively collected who received neoadjuvant therapy between March 2015 and March 2025. For the WSI branch, the predictive performance of five pathology foundation models (CTransPath, Virchow2, H-optimus-0, Phikon-V2, and UNI-V2) was compared within the CLAM-SB framework to identify the optimal feature extractor. The clinical branch was constructed using the extreme gradient boosting (XGBoost) model. Subsequently, a multimodal prediction model for complete pathologic response (MP-CPR) was established through decision-layer logistic regression fusion. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1-score, and Brier score. Interpretability analysis was performed using attention heatmaps and the Shapley Additive Explanations(SHAP) algorithm. Results A total of 728 patients were enrolled in this study. Among them, a total of 676 patients from the Fourth Hospital of Hebei Medical University were selected as internal cohort, which was randomly divided into training set (n=536), validation set (n=70), and internal testing set (n=70) at a ratio of 8:1:1. Additionally, 32 patients from the Affiliated Hospital of Hebei University and 20 from Handan First Hospital were selected as external validation set 1 and external validation set 2, respectively. In the internal testing set, UNI-V2 emerged as the top-performing feature extractor for the WSI branch, achieving an AUC of 0.774 (95% CI: 0.688–0.861). The resulting MP-CPR multimodal model yielded an AUC of 0.812 (95% CI: 0.725–0.899) in the internal testing set, outperforming both the WSI-only model (0.774) and the clinical-only model (0.780), with an accuracy of 81.8%, sensitivity of 70.6%, specificity of 86.5%, and a Brier score of 0.125. In external validation set 1, the MP-CPR model achieved an AUC of 0.746 (95% CI: 0.598–0.898) and an accuracy of 75.0%; in external validation set 2, it achieved an AUC of 0.722 (95% CI: 0.538–0.918) and an accuracy of 80.0%. Attention heatmaps revealed that the model-focused regions were primarily concentrated within the tumor parenchyma and treatment-related peritumoral areas. SHAP analysis indicated that preoperative treatment regimen, pathological diagnosis, tumor-to-stroma ratio, age, and smoking history were the top five contributors to CPR prediction. Conclusion MP-CPR model developed and validated based on multicenter data outperforms single-modality models in predicting CPR for lung cancer. The interpretability results demonstrate good consistency with clinicopathologic knowledge, suggesting that the model holds promise as a potential auxiliary reference for the clinical evaluation of treatment response and the optimization of individualized therapeutic decisions.

HTML全文

参考文献(0)

施引文献

资源附件(0)