A Comparative Study of Artificial Intelligence-based Classification Versus Manual Classification of Medical Adverse Events: Taking the DeepSeek Large Language Model As an Example

WANG Rui; TAN Xutong; ZHAO Congpu; WANG Shuchang; CHEN Zheng; MA Xiaojun; CAI Zhiling

doi:10.12290/xhyxzz.2025-0371

WANG Rui, TAN Xutong, ZHAO Congpu, WANG Shuchang, CHEN Zheng, MA Xiaojun, CAI Zhiling. A Comparative Study of Artificial Intelligence-based Classification Versus Manual Classification of Medical Adverse Events: Taking the DeepSeek Large Language Model As an ExampleJ. Medical Journal of Peking Union Medical College Hospital. DOI: 10.12290/xhyxzz.2025-0371

Citation:

A Comparative Study of Artificial Intelligence-based Classification Versus Manual Classification of Medical Adverse Events: Taking the DeepSeek Large Language Model As an Example

Graphical Abstract

Abstract

Abstract

Objective: Objective: To analyze the application value of artificial intelligence (AI)-based classification in the categorization of medical adverse events. Methods: Medical adverse events reported to the Adverse Event Reporting System of Peking Union Medical College Hospital from September 1, 2023, to August 31, 2024, were retrospectively collected as the study subjects. After de-identification of adverse events meeting the inclusion criteria, conventional manual classification and AI-based classification using a large language model (DeepSeek-R1 Full-Context Internet Edition) were performed. The time required for classification using both methods was recorded, and the consistency and discrepancies between the two methods were compared. Using manual classification as the gold standard, the accuracy of AI-based classification was comprehensively evaluated.Results: A total of 273 medical adverse events were analyzed. Manual classification took 38,838 seconds in total, with an average of 14.22 seconds per event. AI-based classification took 600 seconds in total, with an average of 2.19 seconds per event. The two methods showed consistent classification in 202 events and inconsistent classification in 71 events, yielding an overall agreement rate of 73.99% and a Kappa coefficient of 0.646 (95% CI: 0.575–0.717), with a standard error of 0.0362. Using manual classification as the gold standard, AI-based classification achieved accuracy ranging from 80% to 100%, precision from 30% to 100%, recall from 40% to 100%, F1 scores from 0.46 to 0.79, and specificity from 46% to 98%. Notably, AI-based classification demonstrated balanced and overall excellent performance in the categorization of device-related and drug-related adverse events.Conclusion: The DeepSeek large language model can assist in improving the efficiency of medical adverse event classification, showing promising application potential, particularly in the categorization of device-related and drug-related adverse events.

FullText(HTML)

References (0)

Cited By

A Comparative Study of Artificial Intelligence-based Classification Versus Manual Classification of Medical Adverse Events: Taking the DeepSeek Large Language Model As an Example

Abstract

Catalog

Export File

Citation

Format

Content