ZHOU Yuankai, LIU Pei, LIU Shengjun, YANG Yingying, YUAN Siyi, HE Huaiwu, LONG Yun. The Application of Generative Artificial Intelligence in the Assessment of Critical Care Medicine for Standardized Resident Physician Training[J]. Medical Journal of Peking Union Medical College Hospital. DOI: 10.12290/xhyxzz.2024-0739
Citation:
ZHOU Yuankai, LIU Pei, LIU Shengjun, YANG Yingying, YUAN Siyi, HE Huaiwu, LONG Yun. The Application of Generative Artificial Intelligence in the Assessment of Critical Care Medicine for Standardized Resident Physician Training[J]. Medical Journal of Peking Union Medical College Hospital. DOI: 10.12290/xhyxzz.2024-0739
ZHOU Yuankai, LIU Pei, LIU Shengjun, YANG Yingying, YUAN Siyi, HE Huaiwu, LONG Yun. The Application of Generative Artificial Intelligence in the Assessment of Critical Care Medicine for Standardized Resident Physician Training[J]. Medical Journal of Peking Union Medical College Hospital. DOI: 10.12290/xhyxzz.2024-0739
Citation:
ZHOU Yuankai, LIU Pei, LIU Shengjun, YANG Yingying, YUAN Siyi, HE Huaiwu, LONG Yun. The Application of Generative Artificial Intelligence in the Assessment of Critical Care Medicine for Standardized Resident Physician Training[J]. Medical Journal of Peking Union Medical College Hospital. DOI: 10.12290/xhyxzz.2024-0739
Objective To explore the application effectiveness of generative artificial intelligence (GAI) in the standardized training assessment of critical care medicine residents. Methods The study subjects included residents undergoing standardized training in the critical care medicine departments of Peking Union Medical College Hospital and Beijing Friendship Hospital from June to September 2024, as well as teaching physicians qualified for standardized training instruction. Two sets of GAI-generated examination papers (using Tongyi Qianwen 2.5) and one set of human-generated examination papers were administered to all residents. The answers were graded separately by teaching physicians and Tongyi Qianwen 2.5. The grading results from human and GAI evaluations were compared, and feedback from both residents and teaching physicians on the GAI-generated and humangenerated papers was collected. Results A total of 35 residents and 11 teaching physicians were included in the study. The scores of residents on single-choice questions from the two GAI-generated papers were significantly higher than those from the human-generated paper (both P<0.05), while the scores on multiple-choice questions were significantly lower (both P<0.05). There were no statistically significant differences in the grading of shortanswer questions among the three papers (all P>0.05). In terms of subjective evaluations, both teaching physicians (P= 0 0.007) and residents (P= 0 0.008) perceived the GAI-generated papers as less difficult. However, there were no significant differences in content accuracy or alignment with the training syllabus between the GAI-generated and human-generated papers(all P>0.05). Conclusions GAI performs comparably to human-generated papers in terms of examination paper creation and grading, but further optimization is needed regarding question difficulty. GAI holds promise as a valuable tool for enhancing the efficiency of resident teaching assessments.