Abstract:
Objective To develop an interpretable multimodal model that integrates whole-slide images(WSI) and clinical features, and to validate its efficacy in predicting complete pathological response (CPR) in lung cancer patients following neoadjuvant therapy.
Methods The patients and their clinicopathologic data as well as Hematoxylin and Eosin stained sections were retrospectively collected who received neoadjuvant therapy between March 2015 and March 2025. For the WSI branch, the predictive performance of five pathology foundation models (CTransPath, Virchow2, H-optimus-0, Phikon-V2, and UNI-V2) was compared within the CLAM-SB framework to identify the optimal feature extractor. The clinical branch was constructed using the extreme gradient boosting (XGBoost) model. Subsequently, a multimodal prediction model for complete pathologic response (MP-CPR) was established through decision-layer logistic regression fusion. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1-score, and Brier score. Interpretability analysis was performed using attention heatmaps and the Shapley Additive Explanations(SHAP) algorithm.
Results A total of 728 patients were enrolled in this study. Among them, a total of 676 patients from the Fourth Hospital of Hebei Medical University were selected as internal cohort, which was randomly divided into training set (n=536), validation set (n=70), and internal testing set (n=70) at a ratio of 8:1:1. Additionally, 32 patients from the Affiliated Hospital of Hebei University and 20 from Handan First Hospital were selected as external validation set 1 and external validation set 2, respectively. In the internal testing set, UNI-V2 emerged as the top-performing feature extractor for the WSI branch, achieving an AUC of 0.774 (95% CI: 0.688–0.861). The resulting MP-CPR multimodal model yielded an AUC of 0.812 (95% CI: 0.725–0.899) in the internal testing set, outperforming both the WSI-only model (0.774) and the clinical-only model (0.780), with an accuracy of 81.8%, sensitivity of 70.6%, specificity of 86.5%, and a Brier score of 0.125. In external validation set 1, the MP-CPR model achieved an AUC of 0.746 (95% CI: 0.598–0.898) and an accuracy of 75.0%; in external validation set 2, it achieved an AUC of 0.722 (95% CI: 0.538–0.918) and an accuracy of 80.0%. Attention heatmaps revealed that the model-focused regions were primarily concentrated within the tumor parenchyma and treatment-related peritumoral areas. SHAP analysis indicated that preoperative treatment regimen, pathological diagnosis, tumor-to-stroma ratio, age, and smoking history were the top five contributors to CPR prediction.
Conclusion MP-CPR model developed and validated based on multicenter data outperforms single-modality models in predicting CPR for lung cancer. The interpretability results demonstrate good consistency with clinicopathologic knowledge, suggesting that the model holds promise as a potential auxiliary reference for the clinical evaluation of treatment response and the optimization of individualized therapeutic decisions.