Clinical Prediction Models Based on Traditional Methods and Machine Learning for Predicting First Stroke: Status and Prospects
-
摘要:
脑卒中是全球第3大致死疾病和第4大致残疾病,其较高的致残率和漫长的康复期不仅严重影响患者生存质量,还给家庭和社会带来沉重负担。一级预防是脑卒中防控的核心,通过早期干预危险因素可有效降低其发病率,因此脑卒中首发风险预测模型的构建具有重要临床价值。近年来,大数据与人工智能技术的发展为脑卒中风险的预测开辟了新路径。本文综述传统方法与机器学习模型在脑卒中首发风险预测中的研究现状,并从3个方面展望其未来发展趋势:首先,应注重技术创新,通过引入深度学习、大模型等先进算法,进一步提升预测模型的精确度;其次,需丰富数据类型和优化模型架构,以构建更加全面且精准的预测模型;最后,应强调模型在真实世界中的临床验证,其一方面可增强模型的鲁棒性和普适性,另一方面可促进医生对预测模型的理解,这对预测模型的应用与推广至关重要。
Abstract:Stroke ranks as the third leading cause of death and the fourth leading cause of disability worldwide. Its high disability rate and prolonged recovery period not only severely impact patients' quality of life but also impose a significant burden on families and society. Primary prevention is the cornerstone of stroke control, as early intervention on risk factors can effectively reduce its incidence. Therefore, the development of predictive models for first-ever stroke risk holds substantial clinical value. In recent years, advancements in big data and artificial intelligence technologies have opened new avenues for stroke risk prediction. This article reviews the current research status of traditional methods and machine learning models in predicting first-ever stroke risk and outlines future development trends from three perspectives: First, emphasis should be placed on technological innovation by incorporating advanced algorithms such as deep learning and large models to further enhance the accuracy of predictive models. Second, there is a need to diversify data types and optimize model architectures to construct more comprehensive and precise predictive models. Lastly, particular attention should be given to the clinical validation of models in real-world settings. This not only enhances the robustness and generalizability of the models but also promotes physicians' understanding of predictive models, which is crucial for their application and dissemination.
-
Keywords:
- stroke /
- first stroke /
- machine learning /
- clinical prediction models /
- primary prevention
-
作者贡献:张子娇负责文献整理、论文撰写;丁顺晶、赵地、梁俊负责提供选题思路;雷健波负责论文审校。利益冲突:所有作者均声明不存在利益冲突
-
表 1 常见的传统首发脑卒中风险预测模型或评分量表信息概览
Table 1 An overview of information on common traditional first-ever stroke risk prediction models or scoring scales
模型/量表 纳入的危险因素 适用场景 模型性能 Framingham脑卒中风险预测模型[9-10] 原始版: 年龄、性别、收缩压、抗高血压药物使用、心血管疾病史、当前吸烟状态、房颤、糖尿病和心电图显示的左心室肥厚修订版: 年龄、性别、收缩压、抗高血压药物使用、心血管疾病史、当前吸烟状态、房颤和糖尿病 预测年龄≥ 55岁的成年人未来10年脑卒中发病风险 原始版和修订版预测模型的C统计量范围分别为0.67~0.77和0.66~0.78;在所有样本中, 修订版预测模型校准值Hosmer-Lemeshow统计量多优于原始版 脑卒中风险测量应用程序(Stroke RiskometerTM)[11] 性别、年龄、糖尿病、收缩压、高血压治疗、心血管疾病史、吸烟、饮酒、房颤、左心室肥大、脑卒中或心血管疾病家族史、压力情况、活动情况、腰臀比、体质量指数、腰围、非白种人、饮食、是否有认知障碍、记忆力下降、脑外伤史 预测20岁以上人群未来5年或10年首次发生脑卒中的风险 C统计量范围为0.51 ~ 0.56; D统计量范围为0.01 ~0.12; AUC: 男性为0.740, 女性为0.715 CHADS2评分或CHA2DS2-VASc评分[12-13] CHADS2评分: 充血性心力衰竭、高血压、年龄≥75岁、糖尿病各计1分; 既往脑卒中或短暂性脑缺血发作计2分CHA2DS2-VASc评分: 充血性心力衰竭/左室功能障碍、高血压、糖尿病、血管疾病、年龄65~74岁、女性各计1分; 年龄≥75岁、既往脑卒中/短暂性脑缺血发作/血栓栓塞各计2分。将上述危险因素评分累加得到总评分 适用于非瓣膜性心房颤动患者发生缺血性脑卒中风险评价的工具 CHA2DS2-VASc评分的C统计量为0.606, 略高于CHADS2评分(0.561) 英国脑卒中风险评分(QStroke score)[14] 种族、年龄、性别、当前吸烟状态、房颤状况、收缩压、总胆固醇与高密度脂蛋白胆固醇比值、体质量指数、冠状动脉疾病家族史、Townsend剥夺分数、高血压治疗、1型糖尿病、2型糖尿病、肾脏病、类风湿关节炎、冠状动脉心脏病、充血性心力衰竭、瓣膜性心脏病 预测首次脑卒中或短暂性脑缺血发作风险 R2: 男性为55.1%;女性为57.3%;D统计量: 男性为2.27;女性为2.37;AUC: 男性为0.866, 女性为0.877 汇总队列风险方程[15] 年龄、性别、种族、总胆固醇、高密度脂蛋白、胆固醇、是否进行降压治疗、收缩压、糖尿病史、吸烟 评估患者未来10年发生动脉粥样硬化性心血管疾病(包括致死性和非致死性脑卒中)风险 C统计量: 非西班牙裔白人男性为0.766, 非西班牙裔白人女性为0.784, 非西班牙裔非洲裔男性为0.713, 非西班牙裔非洲裔女性为0.818 中国居民脑卒中风险评估模型(China-PAR stroke risk)[16] 年龄、收缩压、当前吸烟状态、糖尿病、总胆固醇、高密度脂蛋白胆固醇、地理区域、城市化程度、父母脑卒中史(男性人群额外纳入的因素)、腰围(女性人群额外纳入的因素) 预测个体未来10年首次和终生脑卒中风险 10年脑卒中风险预测性能的C统计量: 男性为0.810, 女性为0.810;终生脑卒中风险预测性能的C统计量: 男性为0.789, 女性为0.798 AUC(area under the curve):曲线下面积 表 2 国内外部分机器学习模型预测首发脑卒中风险研究概况
Table 2 An overview of research on domestic and international machine learning models for predicting the risk of first-ever stroke
作者 年份 数据来源 研究算法 适用场景 模型性能 Chun等[27] 2021 中国嘉道理生物库 Cox回归、Logistic回归、支持向量机、随机生存森林、梯度提升树、多层感知器和一种新集成方法(结合梯度提升树或Cox模型的集成方法) 预测成年人在未来9年内的首发脑卒中风险 新集成方法模型表现最佳(准确度:男性为76%,女性为80%;特异度:男性为76%,女性为81%;阳性预测值:男性为26%,女性为24%) Qiu等[28] 2023 中国江西省8个研究中心和14个社区的电子健康记录 Logistic回归、随机森林、决策树模型、极端梯度提升和梯度提升树 预测人群首发脑卒中风险 极端梯度提升(AUC为0.924,准确度为87.3%,灵敏度为77.6%,特异度为91.6%)和随机森林(AUC为0.924,准确度为87.2%,灵敏度为77.8%,特异度为91.3%)在预测脑卒中风险方面的表现更优 Chang等[29] 2023 如皋纵向老龄化研究 Logistic回归、随机森林、高斯核支持向量机、多层感知器、K-最近邻算法和梯度提升树 预测中国≥60岁老年人群缺血性脑卒中发生风险 高斯核支持向量机表现最佳(C统计量:0.79) Orfanoudaki等[30] 2020 弗明翰后代队列数据 该研究基于最优分类树开发了一个非线性中风风险评分(non-linear stroke risk score, N-SRS) 预测成年人群中10年脑卒中的风险 训练集AUC:0.874(95% CI: 0.85~0.90);验证集AUC:0.753(95% CI: 0.74~0.76) You等[31] 2023 英国生物银行的纵向人群队列数据 人工神经网络、Logistic回归、K-最近邻算法、轻量级梯度提升机、随机森林、支持向量机、极端梯度提升 预测成年人心血管疾病(包括脑卒中)10年发病风险 轻量级梯度提升机展现出最佳性能(AUC:0.762±0.010) AUC:同表 1 -
[1] GBD 2021 Stroke Risk Factor Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021[J]. Lancet Neurol, 2024, 23(10): 973-1003. DOI: 10.1016/S1474-4422(24)00369-7
[2] Owolabi M O, Thrift A G, Mahal A, et al. Primary stroke prevention worldwide: translating evidence into action[J]. Lancet Public Health, 2022, 7(1): e74-e85. DOI: 10.1016/S2468-2667(21)00230-9
[3] Crichton S L, Bray B D, McKevitt C, et al. Patient outcomes up to 15 years after stroke: survival, disability, quality of life, cognition and mental health[J]. J Neurol Neurosurg Psychiatry, 2016, 87(10): 1091-1098. DOI: 10.1136/jnnp-2016-313361
[4] Feigin V L, Owolabi M O. Pragmatic solutions to reduce the global burden of stroke: a World Stroke Organization-Lancet Neurology Commission[J]. Lancet Neurol, 2023, 22(12): 1160-1206.
[5] Sarikaya H, Ferro J, Arnold M. Stroke prevention--medical and lifestyle measures[J]. Eur Neurol, 2015, 73(3/4): 150-157. http://www.karger.com/Article/FullText/367652
[6] Goldstein L B, Bushnell C D, Adams R J, et al. Guidelines for the primary prevention of stroke: a guideline for healthcare professionals from the American Heart Association/American Stroke Association[J]. Stroke, 2011, 42(2): 517-584. DOI: 10.1161/STR.0b013e3181fcb238
[7] 胡填, 岑晚霞, 李翠, 等. 临床预测模型在脑卒中的应用与研究进展[J]. 中国临床研究, 2023, 36(3): 386-390. Hu T, Cen W X, Li C, et al. Application and research progress of clinical prediction model in stroke[J]. Chin J Clin Res, 2023, 36(3): 386-390.
[8] Deo R C. Machine learning in medicine[J]. Circulation, 2015, 132(20): 1920-1930. DOI: 10.1161/CIRCULATIONAHA.115.001593
[9] Wolf P A, D'Agostino R B, Belanger A J, et al. Probability of stroke: a risk profile from the Framingham Study[J]. Stroke, 1991, 22(3): 312-318. DOI: 10.1161/01.STR.22.3.312
[10] Dufouil C, Beiser A, McLure L A, et al. Revised framingham stroke risk profile to reflect temporal trends[J]. Circulation, 2017, 135(12): 1145-1159. DOI: 10.1161/CIRCULATIONAHA.115.021275
[11] Parmar P, Krishnamurthi R, Ikram M A, et al. The Stroke Riskometer(TM) App: validation of a data collection tool and stroke risk predictor[J]. Int J Stroke, 2015, 10(2): 231-244. DOI: 10.1111/ijs.12411
[12] Gage B F, Waterman A D, Shannon W, et al. Validation of clinical classification schemes for predicting stroke: results from the National Registry of Atrial Fibrillation[J]. JAMA, 2001, 285(22): 2864-2870. DOI: 10.1001/jama.285.22.2864
[13] Lip G Y H, Nieuwlaat R, Pisters R, et al. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the Euro heart survey on atrial fibrillation[J]. Chest, 2010, 137(2): 263-272. DOI: 10.1378/chest.09-1584
[14] Hippisley-Cox J, Coupland C, Brindle P. Derivation and validation of QStroke score for predicting risk of ischaemic stroke in primary care and comparison with other risk scores: a prospective open cohort study[J]. BMJ, 2013, 346: f2573. DOI: 10.1136/bmj.f2573
[15] Goff D C Jr, Lloyd-Jones D M, Bennett G, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines[J]. Circulation, 2014, 129(25 Suppl 2): S49-S73.
[16] Xing X L, Yang X L, Liu F C, et al. Predicting 10-year and lifetime stroke risk in Chinese population[J]. Stroke, 2019, 50(9): 2371-2378. DOI: 10.1161/STROKEAHA.119.025553
[17] Grundy S M, D'Agostino R B Sr, Mosca L, et al. Cardiovascular risk assessment based on US cohort studies: findings from a National Heart, Lung, and Blood institute workshop[J]. Circulation, 2001, 104(4): 491-496. DOI: 10.1161/01.CIR.104.4.491
[18] Chun M, Clarke R, Zhu T T, et al. Development, validation and comparison of multivariable risk scores for prediction of total stroke and stroke types in Chinese adults: a prospective study of 0.5 million adults[J]. Stroke Vasc Neurol, 2022, 7(4): 328-336. DOI: 10.1136/svn-2021-001251
[19] Zhang Y L, Fang X H, Guan S C, et al. Validation of 10-year stroke prediction scores in a community-based cohort of Chinese older adults[J]. Front Neurol, 2020, 11: 986. http://www.socolar.com/Article/Index?aid=100084717716&jid=100000009585
[20] Daidone M, Ferrantelli S, Tuttolomondo A. Machine learning applications in stroke medicine: advancements, challenges, and future prospectives[J]. Neural Regen Res, 2024, 19(4): 769-773. http://www.zhangqiaokeyan.com/academic-journal-cn_chinese-nerve-regeneration-research-english-version_thesis/02012101895978.html
[21] Obermeyer Z, Emanuel E J. Predicting the future-big data, machine learning, and clinical medicine[J]. N Engl J Med, 2016, 375(13): 1216-1219. DOI: 10.1056/NEJMp1606181
[22] 王柳丁. 基于机器学习的病证结合脑卒中风险预测模型的开发与验证[D]. 北京: 中国中医科学院, 2023. Wang L D. Development and validation of an early warming model for new-onset stroke in Chinese populations at high risk of stroke[D]. Beijing: China Academy of Chinese Medical Sciences, 2023.
[23] 孙资金, 吉静, 马重阳, 等. 基于机器学习的中风中医辨证模型的构建与应用[J]. 湖南中医药大学学报, 2023, 43(4): 694-699. DOI: 10.3969/j.issn.1674-070X.2023.04.019 Sun Z J, Ji J, Ma Z Y, et al. Construction and application of stroke TCM pattern differentiation model based on machine learning[J]. J Hunan Univ Chin Med, 2023, 43(4): 694-699. DOI: 10.3969/j.issn.1674-070X.2023.04.019
[24] Wang Z L, Jiang M, Hu Y H, et al. An incremental learning method based on probabilistic neural networks and adjustable fuzzy clustering for human activity recognition by using wearable sensors[J]. IEEE Trans Inf Technol Biomed, 2012, 16(4): 691-699. DOI: 10.1109/TITB.2012.2196440
[25] Shehab M, Abualigah L, Shambour Q, et al. Machine learning in medical applications: a review of state-of-the-art methods[J]. Comput Biol Med, 2022, 145: 105458. DOI: 10.1016/j.compbiomed.2022.105458
[26] 万红燕, 刘婕, 郝舒欣, 等. 基于随机森林算法的南京地区脑卒中风险预测模型构建[J]. 环境卫生学杂志, 2024, 14(7): 590-596. Wan H Y, Liu J, Hao S X, et al. Construction of a risk prediction model for stroke based on random forest algorithm in Nanjing, China[J]. J Environ Hyg, 2024, 14(7): 590-596.
[27] Chun M, Clarke R, Cairns B J, et al. Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults[J]. J Am Med Inform Assoc, 2021, 28(8): 1719-1727. DOI: 10.1093/jamia/ocab068
[28] Qiu Y X, Cheng S Q, Wu Y H, et al. Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study[J]. BMJ Open, 2023, 13(3): e068045. DOI: 10.1136/bmjopen-2022-068045
[29] Chang H W, Zhang H, Shi G P, et al. Ischemic stroke prediction using machine learning in elderly Chinese population: the Rugao Longitudinal Ageing Study[J]. Brain Behav, 2023, 13(12): e3307. DOI: 10.1002/brb3.3307
[30] Orfanoudaki A, Chesley E, Cadisch C, et al. Machine learning provides evidence that stroke risk is not linear: the non-linear Framingham stroke risk score[J]. PLoS One, 2020, 15(5): e0232414. DOI: 10.1371/journal.pone.0232414
[31] You J, Guo Y, Kang J J, et al. Development of machine learning-based models to predict 10-year risk of cardiova-scular disease: a prospective cohort study[J]. Stroke Vasc Neurol, 2023, 8(6): 475-485. DOI: 10.1136/svn-2023-002332
[32] 万红燕, 郝舒欣, 刘婕, 等. 机器学习在脑卒中风险预测中的应用进展[J]. 中国基层医药, 2024, 31(8): 1275-1280. DOI: 10.3760/cma.j.cn341190-20231120-00462 Wan H Y, Hao S X, Liu J, et al. Advances in the application of machine learning for stroke risk prediction[J]. Chin J Prim Med Pharm, 2024, 31(8): 1275-1280. DOI: 10.3760/cma.j.cn341190-20231120-00462
[33] 杜慧杰, 刘星雨, 徐明欢, 等. 急性缺血性脑卒中预后预测研究的应用进展: 以机器学习预测模型为例[J]. 中国全科医学, 2025, 28(5): 554-560. DOI: 10.12114/j.issn.1007-9572.2024.0090 Du H J, Liu X Y, Xu M H, et al. Advances in the prognostic prediction of acute ischemic stroke: using machine learning predictive models as an example[J]. Chin Gen Pract, 2025, 28(5): 554-560. DOI: 10.12114/j.issn.1007-9572.2024.0090
[34] 张穿洋, 朱文莉, 李晓冉, 等. 急性脑卒中预后预测模型: 机器学习与传统回归模型的比较[J]. 中国CT和MRI杂志, 2023, 21(7): 24-26. Zhang C Y, Zhu W L, Li X R, et al. The outcome prediction model of acute stroke: comparison between machine learning and traditional regression model[J]. Chin J CT MRI, 2023, 21(7): 24-26.
[35] Hong C, Pencina M J, Wojdyla D M, et al. Predictive accuracy of stroke risk prediction models across black and white race, sex, and age groups[J]. JAMA, 2023, 329(4): 306-317. DOI: 10.1001/jama.2022.24683
[36] Nijman S W, Leeuwenberg A M, Beekers I, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review[J]. J Clin Epidemiol, 2022, 142: 218-229. DOI: 10.1016/j.jclinepi.2021.11.023
[37] Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning[J]. Electron Mark, 2021, 31(3): 685-695. DOI: 10.1007/s12525-021-00475-2
[38] Kamnitsas K, Ledig C, Newcombe V F J, et al. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation[J]. Med Image Anal, 2017, 36: 61-78. DOI: 10.1016/j.media.2016.10.004
[39] Shin H C, Roth H R, Gao M C, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning[J]. IEEE Trans Med Imaging, 2016, 35(5): 1285-1298. http://www.onacademic.com/detail/journal_1000038688910210_9161.html
[40] Moulaei K, Afshari L, Moulaei R, et al. Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models[J]. Sci Rep, 2024, 14(1): 31392. DOI: 10.1038/s41598-024-82931-5
[41] Luo N, Shi W Y, Yang Z Y, et al. Multimodal fusion of brain imaging data: methods and applications[J]. Mach Intell Res, 2024, 21(1): 136-152. http://www.mi-research.net/article/exportPdf?id=7e91c5f9-4816-4be2-9670-ceec939a1ab2
[42] Colangelo G, Ribo M, Montiel E, et al. PRERISK: a personalized, artificial intelligence-based and statistically-based stroke recurrence predictor for recurrent stroke[J]. Stroke, 2024, 55(5): 1200-1209.
[43] 甄紫伊, 刘蕾, 吴薇, 等. 脑卒中首发风险预测模型的研究进展[J]. 循证护理, 2023, 9(4): 644-647. Zhen Z Y, Liu L, Wu W, et al. Research progress on risk prediction model of first stroke[J]. Chin Evid Based Nurs, 2023, 9(4): 644-647.
[44] 夏鑫, 牟玮, 李艳芬, 等. 基于机器学习技术挖掘中医名家医案数据的方法探讨[J]. 医学新知, 2024, 34(4): 448-457. Xia X, Mu W, Li Y F, et al. Approaches to the mining of traditional Chinese medical experts' case histories using machine learning techniques[J]. New Med, 2024, 34(4): 448-457.
[45] 佘楷杰, 袁艿君, 马庆宇, 等. 机器学习驱动中医诊断智能化的发展现状、问题及解决路径[J]. 中国中医基础医学杂志, 2024, 30(3): 398-406. She K J, Yuan N J, Ma Q Y, et al. The development status, problems, and solutions of machine learning driven intelligence in traditional Chinese medicine diagnosis[J]. J Basic Chin Med, 2024, 30(3): 398-406.
[46] 许明东, 马晓聪, 温宗良, 等. 支持向量机在高血压病中医证候诊断中的应用[J]. 中华中医药杂志, 2017, 32(6): 2497-2500. Xu M D, Ma X C, Wen Z L, et al. Application of support vector machine in the diagnosis of hypertension in TCM syndrome[J]. China J Tradit Chin Med Pharm, 2017, 32(6): 2497-2500.
[47] Collins G S, Dhiman P, Ma J, et al. Evaluation of clinical prediction models (part 1): from development to external validation[J]. BMJ, 2024, 384: e074819.
[48] Riley R D, Archer L, Snell K I E, et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study[J]. BMJ, 2024, 384: e074820.
[49] Riley R D, Snell K I E, Archer L, et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study[J]. BMJ, 2024, 384: e074821.
计量
- 文章访问数: 122
- HTML全文浏览量: 41
- PDF下载量: 26