-
摘要:
目的 基于深度学习技术,建立胃活检病理切片胃癌诊断模型,并对模型的性能进行评价。 方法 回顾性收集2015年1月—2020年1月浙江省人民医院胃活检诊断为正常胃黏膜、慢性胃炎、高级别上皮内瘤变和胃腺癌患者的病理切片。以20倍率扫描为全视野数字图像(whole slide image, WSI),并按2:2:1的比例随机分为图块分类数据集、切片分类训练集与切片分类测试集。对图块分类数据集病变区域进行标注、图块截取后,按20:1:1的比例随机分为训练集、测试集、验证集。基于Efficientnet和ResNet网络结构构建卷积神经网络(convolutional neural network,CNN)图块级癌与非癌分类模型,并以图块分类准确率、受试者操作特征曲线下面积(area under the curve, AUC)评价该模型的性能。基于此模型拼接获取整张WSI的癌变热力图,提取热力图中切片级癌与非癌分类特征,对LightGBM算法进行训练,最终完成整张胃癌活检切片的诊断与识别,其识别结果以AUC、准确率、灵敏度、特异度进行评价。 结果 共入选符合纳入和排除标准的胃良性疾病(正常胃黏膜、慢性炎症)病理切片500张,胃癌(高级别上皮内瘤变、胃腺癌)病理切片500张。图块分类数据集、切片分类训练集与切片分类测试集WSI分别为400张、400张、200张。图块分类训练集、测试集、验证集图块分别为402 000个、20 000个、20 000个。以Efficientnet-b1网络结构建立的CNN模型对测试集、验证集图块分类的准确率[测试集:91.3%(95% CI: 88.2%~95.4%);验证集:92.5%(95% CI: 89.0%~95.3%)]、AUC[测试集:0.95(95% CI: 0.93~0.98);验证集:0.96(95% CI: 0.92~0.98)]均最高。基于LightGBM算法构建的模型识别整张切片为胃癌的AUC为0.98(95% CI: 0.89~0.98),准确率为88.0%(95% CI: 81.6%~94.3%),灵敏度为100%(95% CI: 88.0%~100%),特异度为67.0%(95% CI: 57.0%~85.0%)。 结论 基于胃活检病理切片建立的CNN诊断模型可对癌变组织进行定位,实现图块级和切片级病变性质精确分类,准确识别胃癌,有望提高病理诊断效率。 Abstract:Objective To build a diagnostic model of gastric cancer based on deep learning and evaluate the performance of the model. Methods The pathological sections of patients diagnosed with normal gastric mucosa, chronic gastritis, high-grade intraepithelial neoplasia or gastric adenocarcinoma by endoscopic examination in Zhejiang Provincial People's Hospital from January 2015 to January 2020 were retrospectively selected. The pathology slides were scanned at ×20 magnification to generate whole slide images (WSIs). These WSIs were randomly divided into patch classification data set, slide classification training set and slide classification test set at a ratio of 2:2:1. After the lesion regions of the patch classification data set were annotated and the patches were selected, they were randomly divided into training set, test set and validation set at a ratio of 20:1:1. The deep learning model Efficientnet and ResNet were used to train and the convolutional neural network (CNN) model for cancer and non-cancer classification was constructed. Based on the patch classification test set and validation set, the performance of the model was evaluated. The results were evaluated by the patch classification accuracy and the area under the curve (AUC). This model was used for image stitching to generate the cancerous heat map of WSIs and extract the slide-level cancer and non-cancer classification features of the heat map. LightGBM slide-level classification algorithm were trained and evaluated, and the gastric cancer of WSIs were diagnosed and recognized. The results were evaluated by AUC, accuracy, sensitivity and specificity. Results A total of 500 pathological sections of benign gastric diseases (normal gastric mucosa, chronic gastritis) and 500 pathological sections of gastric cancer (high-grade intraepithelial neoplasia and gastric adenocarcinoma) that met the inclusion and exclusion criteria were selected. The patch classification data set, slide classification training set and slide classification test set were 400, 400 and 200, respectively. The patch classification training set, test set, validation set were 402 000, 20 000, 20 000, respectively. CNN model based on Efficientnet-b1 network structure for patch classification in test set and validation set achieved the highest accuracy[test set: 91.3% (95% CI: 88.2%-95.4%); validation set: 92.5%(95% CI: 89.0%-95.3%)]and the highest AUC[test set: 0.95(95% CI: 0.93-0.98); validation set: 0.96(95% CI: 0.92-0.98)]. The AUC of the model based on LightGBM algorithm was 0.98(95% CI: 0.89-0.98), with accuracy of 88.0%(95% CI: 81.6%-94.3%), sensitivity 100%(95% CI: 88.0%-100%), and specificity 67.0%(95% CI: 57.0%-85.0%). Conclusion The CNN diagnostic model based on the pathology slides of gastric biopsy can locate the cancerous tissues, classify patch-level and slide-level lesion natures accurately, identify gastric cancer accurately, which has the potential to improve the diagnosis efficiency. -
Key words:
- convolutional neural network /
- digital pathology /
- gastric cancer /
- diagnosis model
作者贡献:王继仙负责数据整理与分析、论文撰写;桂坤、陈炳宪负责研究实施、数据分析;茹国庆负责病理阅片、研究设计;赵地负责研究设计、数据分析;陈万远、张志勇负责病理阅片、文献整理、论文修订。利益冲突:所有作者均声明不存在利益冲突 -
表 1 基于5种网络结构构建的图块级癌与非癌分类模型的性能比较
网络结构 准确率[%(95% CI)] AUC(95% CI) 测试集 验证集 测试集 验证集 Efficientnet-b1 91.3(88.2~95.4) 92.5(89.0~95.3) 0.95(0.93~0.98) 0.96(0.92~0.98) Efficientnet-b2 90.2(87.3~95.1) 91.6(88.4~95.8) 0.94(0.92~0.98) 0.95(0.91~0.98) Efficientnet-b3 89.5(86.2~93.7) 89.9(86.7~93.4) 0.94(0.92~0.97) 0.95(0.91~0.98) ResNet50 89.3(85.3~93.8) 91.3(88.1~95.7) 0.91(0.88~0.94) 0.93(0.89~0.96) ResNet101 88.2(84.8~91.5) 90.4(87.4~94.8) 0.90(0.88~0.93) 0.91(0.88~0.95) AUC: 曲线下面积 表 2 根据癌变热力图和前景信息筛选的癌与非癌分类特征
特征数量(个) 特征描述 热力图阈值 1 肿瘤连通域总面积 0.9 1 肿瘤连通域面积与前景组织的比值 0.5 1 最大肿瘤连通域的面积 0.5 1 最大肿瘤连通域的最长轴长度 0.5 1 热力图像素总数量 0.5 1 所有肿瘤区域中像素与边框中像素比值的均值 0.9 5 各肿瘤连通域面积的最大值、均值、方差、偏态系数、峰度 0.9 5 各肿瘤连通域周长的最大值、均值、方差、偏态系数、峰度 0.9 5 各肿瘤区域中像素与边框中像素比值的最大值、均值、方差、偏态系数、峰度 0.5 5 各肿瘤区域中像素与凸包图像中像素比值的最大值、均值、方差、偏态系数、峰度 0.9 5 各肿瘤连通域第二矩的椭圆偏心率(焦距与主轴长度的比值)的最大值、均值、方差、偏态系数、峰度 0.9 表 3 切片级癌与非癌相关性最强的5个肿瘤分类特征
序号 特征描述 Pearson相关系数r 1 各肿瘤区域中像素与凸包图像中像素比值的方差 0.852 2 各肿瘤连通域面积的偏态系数 0.835 3 最大肿瘤连通域的最长轴长度 0.833 4 各肿瘤连通域周长的偏态系数 0.823 5 最大肿瘤连通域的面积 0.748 -
[1] Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019[J]. Cancer J Clin, 2019, 69: 7-34. doi: 10.3322/caac.21551 [2] Yoshikawa K, Maruyama K. Characteristics of gastric cancer invading to the proper muscle layer-with special reference to mortality and cause of death[J]. JPN J Clin Oncol, 1985, 15: 499-503. [3] Everett SM, Axon AT. Early gastric cancer in Europe[J]. Gut, 1997, 41: 142-150. doi: 10.1136/gut.41.2.142 [4] Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015[J]. CA Cancer J Clin, 2016, 66: 115-132. doi: 10.3322/caac.21338 [5] Oh CM, Won YJ, Jung KW, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2013[J]. Cancer Res Treat, 2016, 48: 436-450. doi: 10.4143/crt.2016.089 [6] Matsuda T, Ajiki W, Marugame T, et al. Population-based survival of cancer patients diagnosed between 1993 and 1999 in Japan: a chronological and international comparative study[J]. JPN J Clin Oncol, 2011, 41: 40-51. doi: 10.1093/jjco/hyq167 [7] Jin L, Shi F, Chun Q, et al. Artificial intelligence neuropathologist for glioma classification using deep learning on hematoxylin and eosin stained slide images and molecular markers[J]. Neuro Oncol, 2021, 23: 44-52. doi: 10.1093/neuonc/noaa163 [8] Naik N, Madani A, Esteva A, et al. Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains[J]. Nat Commun, 2020, 11: 5727. doi: 10.1038/s41467-020-19334-3 [9] Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning[J]. Nat Med, 2018, 24: 1559-1567. doi: 10.1038/s41591-018-0177-5 [10] Sharma H, Zerbe N, Klempert I, et al. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology[J]. Comput Med Imaging Graph, 2017, 61: 2-13. doi: 10.1016/j.compmedimag.2017.06.001 [11] Sharma H, Zerbe N, Heim D, et al. A multi-resolution approach for combining visual information usingnuclei segmentation and classification in histopathological images[C]. Proceedings of the 10th International Conference on Com-puter Vision Theoryand Applications (VISAPP 2015), 2015, 3: 37-46. [12] Arends MJ, Fukayama M, Klimstra DS, et al. WHO Classification of tumours of the digestive system[M]. 5thed. Lyon: IARC Press, 2019: 1-635. [13] Garcia E, Hermoza R, Castanon C B, et al. Automatic Lymphocyte Detection on Gastric Cancer IHC Images Using Deep Learning[C]. IEEE International Symposium on Computer-based Medical Systems, 2017. doi: 10.1109/CBMS.2017.94. [14] Tomita N, Abdollahi B, Wei J, et al. Attention-Based Deep Neural Networks for Detection of Cancerous and Precancer-ous Esophagus Tissue on Histopathological Slides[J]. JAMA Netw Open, 2019, 2: e1914645. doi: 10.1001/jamanetworkopen.2019.14645 [15] Iizuka O, Kanavati F, Kato K, et al. Deep Learning Models for Histopathological Classification of Gastric and Colonic Epithelial Tumours[J]. Sci Rep, 2020, 10: 1504. doi: 10.1038/s41598-020-58467-9 [16] Wang S, Zhu Y, Yu L, et al. RMDL: Recalibrated multi-instance deep learning for whole slide gastric image classification[J]. Med Image Anal, 2019, 58: 101549. doi: 10.1016/j.media.2019.101549 [17] Song Z, Zou S, Zhou W, et al. Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning[J]. Nat Commun, 2020, 11: 4294. doi: 10.1038/s41467-020-18147-8 -