脂质组学联合机器学习筛选老年早期肺癌标志物

Lipidomics Combined with Machine Learning for Screening Biomarkers of Early-stage Lung Cancer in the Elderly

  • 摘要: 目的 联合血浆脂质组学和机器学习方法,筛选诊断老年早期肺癌的分子标志物,并评估其诊断效能。方法 本研究为回顾性诊断准确性研究,共包含2个部分。其中第一部分为分子标志物筛选,纳入2023年11月—2024年11月北京大学人民医院老年早期肺癌患者(早期肺癌组)、良性肺结节患者(良性肺结节组)及同期健康体检者(健康对照组)为研究对象。此外,纳入本课题组前期研究中符合本研究纳入标准的早期肺癌患者和健康对照为独立验证集。采集受试者血浆样本,采用高效液相色谱-质谱联用技术进行非靶向脂质组学检测。使用主成分分析和正交偏最小二乘判别分析评价组间代谢差异,并采用L1正则化支持向量机结合增量特征选择法筛选早期肺癌的诊断标志物。通过受试者工作特征曲线、校准曲线、Brier评分和决策曲线分析评估模型性能。第二部分为分子标志物功能验证,以人肺腺癌细胞系A549为研究对象。选取棕榈酸肉碱(CAR 16:0)为代表,通过CCK-8实验和细胞划痕试验进行功能验证。结果 最终纳入早期肺癌组36例、良性肺结节组35例、健康对照组41名、独立验证集110人(早期肺癌患者59例、健康对照人群51名)。主成分分析结果显示,质量控制样本紧密聚集于所有样本的中心位置,表明仪器稳定性良好,数据质量可靠。正交偏最小二乘判别分析显示,早期肺癌组与对照组(良性肺结节组+健康对照组)存在明显的代谢差异(R2X=0.406,R2Y=0.529,Q2Y=0.44)。L1正则化支持向量机筛选出棕榈油酸肉碱(CAR 16:1)、棕榈酸肉碱、α-亚麻酸肉碱(CAR 18:3)、亚油酸肉碱(CAR 18:2)和油酸肉碱(CAR 18:1)共5种肉碱类脂质作为早期肺癌的诊断标志物,其选择稳定性均>98%。在筛查场景下(早期肺癌组比良性肺结节组+健康对照组),由此5种标志物组合建立的模型诊断早期肺癌的曲线下面积(area under the curve,AUC)为0.895(95%CI:0.700~1.000),灵敏度为98.4%,特异度为63.9%,准确率为75.0%;模型鉴别诊断早期肺癌与良性肺结节的AUC为0.877 (95%CI:0.797~0.965)、灵敏度为86.1%、特异度为80.0%,鉴别诊断早期肺癌与健康对照人群的AUC为0.929 (95%CI:0.877~0.988)、灵敏度为94.4%、特异度为85.4%。校准曲线和决策曲线分析显示,模型校准度良好,整体而言可使早期肺癌患者获益。在独立验证集中,模型诊断早期肺癌的AUC为0.874 (95%CI:0.781~0.940),灵敏度为86.4%,特异度为82.4%,准确率为86.4%。体外实验显示,棕榈酸肉碱可抑制A549细胞增殖及迁移,半数抑制浓度为55.04μmol/L。结论 基于非靶向脂质组学及机器学习筛选的5种血浆肉碱类脂质可作为老年早期肺癌诊断的潜在分子标志物,模型的高灵敏度特性尤其适用于早期肺癌的筛查场景。

     

    Abstract: Objective Based on plasma lipidomics combined with machine learning approaches, this study aimed to screen molecular biomarkers for the diagnosis of early-stage lung cancer in elderly patients and to evaluate their diagnostic performance. Methods This was a retrospective diagnostic study consisting of two parts. The first part involved molecular biomarker screening. Elderly patients with early-stage lung cancer (early lung cancer group), patients with benign pulmonary nodules (benign nodule group), and contemporaneous healthy individuals undergoing physical examinations (healthy control group) were enrolled from Peking University People's Hospital between November 2023 and November 2024. In addition, early-stage lung cancer patients and healthy controls meeting the inclusion criteria from a previous study of our research group were included as an independent validation cohort. Plasma samples were collected from all subjects, and untargeted lipidomics analysis was performed using high-performance liquid chromatography-mass spectrometry. Principal component analysis and orthogonal partial least squares discriminant analysis were used to evaluate metabolic differences between groups. L1-regularized support vector machine combined with incremental feature selection was employed to screen diagnostic biomarkers for early-stage lung cancer. Model performance was assessed using receiver operating characteristic curves, calibration curves, Brier scores, and decision curve analysis. The second part involved functional validation of the molecular biomarkers using the human lung adenocarcinoma cell line A549, with palmitoylcarnitine (CAR 16:0) selected as a representative biomarker for functional validation via CCK-8 and cell scratch assays. Results A total of 36 patients in the early lung cancer group, 35 patients in the benign nodule group, and 41 healthy controls were enrolled, along with an independent validation cohort of 110 individuals (59 patients with early-stage lung cancer and 51 healthy controls). The principal component analysis results demonstrated that quality control samples were tightly aggregated at the centroid of all samples, reflecting robust instrument performance and dependable data quality. Orthogonal partial least squares discriminant analysis revealed significant metabolic differences between the early lung cancer group and the control group (benign nodule group + healthy control group) (R2X=0.406, R2Y=0.529, Q2Y=0.44). L1-regularized support vector machine identified five carnitine-related lipids-palmitoleoylcarnitine (CAR 16:1), palmitoylcarnitine, α-linolenoylcarnitine (CAR 18:3), linoleoylcarnitine (CAR 18:2), and oleoylcarnitine (CAR 18:1) -as diagnostic biomarkers for early-stage lung cancer, all with stability values >98%. In the screening scenario (early lung cancer group vs. benign nodule group + healthy control group), the model based on these five biomarkers achieved an area under the curve (AUC) of 0.895 (95% CI:0.700-1.000) for diagnosing early-stage lung cancer, with a sensitivity of 98.4%, specificity of 63.9%, and accuracy of 75.0%. For differentiating early-stage lung cancer from benign pulmonary nodules, the model yielded an AUC of 0.877 (95% CI:0.797-0.965), sensitivity of 86.1%, and specificity of 80.0%. For differentiating early-stage lung cancer from healthy controls, the model yielded an AUC of 0.929 (95% CI:0.877-0.988), sensitivity of 94.4%, and specificity of 85.4%. Calibration and decision curve analyses demonstrated good model calibration and overall net benefit for patients with early-stage lung cancer. In the independent validation cohort, the model achieved an AUC of 0.874 (95% CI:0.781-0.940) for diagnosing early-stage lung cancer, with a sensitivity of 86.4%, specificity of 82.4%, and accuracy of 86.4%. In vitro experiments showed that palmitoylcarnitine inhibited the proliferation and migration of A549 cells, with a half-maximal inhibitory concentration of 55.04 μmol/L. Conclusions The five plasma carnitine-related lipids screened based on untargeted lipidomics and machine learning may serve as potential molecular biomarkers for the diagnosis of early-stage lung cancer in elderly patients. The high-sensitivity characteristic of the model makes it particularly suitable for screening scenarios in early-stage lung cancer.

     

/

返回文章
返回