Zhejiang University, National Institutes for Food and Drug Control, Shanghai Changzheng Hospital. Expert Consensus on General Methods for Performance Evaluation of Artificial Intelligence Medical Devices (2023)[J]. Medical Journal of Peking Union Medical College Hospital, 2023, 14(3): 494-503. DOI: 10.12290/xhyxzz.2023-0137
Citation: Zhejiang University, National Institutes for Food and Drug Control, Shanghai Changzheng Hospital. Expert Consensus on General Methods for Performance Evaluation of Artificial Intelligence Medical Devices (2023)[J]. Medical Journal of Peking Union Medical College Hospital, 2023, 14(3): 494-503. DOI: 10.12290/xhyxzz.2023-0137

Expert Consensus on General Methods for Performance Evaluation of Artificial Intelligence Medical Devices (2023)

Funds: 

National Research and Development Program of China 2019YFC0118800

More Information
  • Corresponding authors: LIU Shiyuan, E-mail: cjr.liushiyuan@vip.163.com
    LI Jingli, E-mail: lijli@nifdc.org.cn
    WU Jian, E-mail: wujian2000@zju.edu.cn
    1. Department of Radiology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai 200003, China
    2. Medical Device Inspection Institute, National Institutes for Food and Drug Control, Beijing 102629, China
    3. School of Public Health, Zhejiang University, Hangzhou 310027, China

  • Received Date: March 18, 2023
  • Accepted Date: May 07, 2023
  • Available Online: May 15, 2023
  • Issue Publish Date: May 29, 2023
  • Artificial intelligence medical devices are rapidly evolving, and the performance evaluation methods of the products need to be standardized and innovated. With the goal of promoting industry, supporting supervision, and improving the quality of artificial intelligence medical device products, Zhejiang University, in cooperation with a number of professional institutions such as the National Institutes for Food and Drug Control, and relying on the centralized unit of artificial intelligence medical device standardization technology, led the efforts to analyze the common problems in performance evaluation and summarize related test methods of these devices. Based on the consensus of the expert group, this paper introduces various test methods and their applications in detail, and expounds the sampling of test data. The aim is to unify understanding, promote thestandardization of artificial intelligence medical device performance evaluation methods, and finally boost the high-quality development of artificial intelligence medical devices.
  • [1]
    Chen T, Liu X, Feng R, et al. Discriminative cervical lesion detection in colposcopic images with global class activation and local bin excitation[J]. IEEE J Biomed Health Inform, 2022, 26: 1411-1421. DOI: 10.1109/JBHI.2021.3100367
    [2]
    Lin Z, Guo R, Wang Y, et al. A framework for identifying diabetic retinopathy based on anti-noise detection and attention-based fusion[C]. International Conference on Medical Image Computing and Computer-Assisted Interven-tion. Springer, Cham, 2018: 74-82.
    [3]
    Chen J, Yu B, Lei B, et al. Doctor imitator: A graph-based bone age assessment framework using hand radiographs[C]. International Conference on Medical Image Comput-ing and Computer-Assisted Intervention. Springer, Cham, 2020: 764-774.
    [4]
    International Electrotechnical Commission. PWI 62-3 ED1: Artificial Intelligence/Machine Learning-enabled Medical Device-Performance Evaluation Process[EB/OL]. [2023-03-18]. https://www.iec.ch/ords/f?p=103:38:402197631962789::::FSP_ORG_ID,FSP_APEX_PAGE,FSP_PROJECT_ID:1245,23,107066.
    [5]
    International Electrotechnical Commission. PNW 62-411 ED1: Testing of Artificial Intelligence/Machine Learning-enabled Medical Devices[EB/OL]. [2023-03-18]. https://www.iec.ch/ords/f?p=103:38:402197631962789::::FSP_ORG_ID,FSP_APEX_PAGE,FSP_PROJECT_ID:1245,23,109273.
    [6]
    国家药品监督管理局. 人工智能医疗器械质量要求和评价第1部分: 术语YY/T 1833.1-2022[S]. 北京: 中国标准出版社. 2022.
    [7]
    国家药品监督管理局. 人工智能医疗器械质量要求和评价第2部分: 数据集通用要求YY/T 1833.2-2022[S]. 北京: 中国标准出版社. 2022.
    [8]
    国家药品监督管理局. 人工智能医疗器械质量要求和评价第3部分: 数据标注通用要求YY/T 1833.3-2022[S]. 北京: 中国标准出版社. 2022.
    [9]
    国家药品监督管理局. 人工智能医疗器械肺部影像辅助分析软件算法性能测试方法YY/T 1858-2022[S]. 北京: 中国标准出版社. 2022.
    [10]
    Huang X, Kwiatkowska M, Wang S, et al. Safety verification of deep neural networks[C]. Computer Aided Verification: 29th International Conference, CAV 2017, Heidelberg, Germany, July 24—28, 2017, Proceedings, Part Ⅰ 30. Springer International Publishing, 2017: 3-29.
    [11]
    Montano JJ, Palmer A. Numeric sensitivity analysis applied to feedforward neural networks[J]. Neural Comput Appl, 2003, 12: 119-125. DOI: 10.1007/s00521-003-0377-9
    [12]
    Bunel RR, Turkaslan I, Torr P, et al. A unified view of piecewise linear neural network verification[J/OL]. [2023-03-18]. https://arxiv.org/abs/1711.00455v2.
    [13]
    Tang S, Gong R, Wang Y, et al. Robustart: Bench-marking robustness on architecture design and training techniques[J/OL]. [2023-03-18]. https://arxiv.org/abs/2109.05211.
    [14]
    Tian Y, Pei K, Jana S, et al. Deeptest: Automated testing of deep-neural-network-driven autonomous cars[C]. Proceedings of the 40th international conference on software engineering, 2018: 303-314.
    [15]
    Singh G, Gehr T, Püschel M, et al. An abstract domain for certifying neural networks[EB/OL]. [2023-03-18]. https://www.sri.inf.ethz.ch/publications/singh2019domain.
    [16]
    Wang L, Wang H, Xia C, et al. Toward standardized premarket evaluation of computer aided diagnosis/detection products: insights from FDA-approved products[J]. Expert Rev Med Devices, 2020, 17: 899-918. DOI: 10.1080/17434440.2020.1813566
    [17]
    中华医学会放射学分会, 中国食品药品检定研究院, 国家卫生健康委能力建设与继续教育中心, 等. 胸部CT肺结节数据集构建及质量控制专家共识[J]. 中华放射学杂志, 2021, 55: 104-110.
    [18]
    陈耀龙, 罗旭飞. 临床实践指南的制订方法与步骤[J]. 中华传染病杂志, 2019, 37: 523-526. DOI: 10.3760/cma.j.issn.1000-6680.2019.09.003
    [19]
    陈耀龙, 罗旭飞, 王吉耀, 等. 如何区分临床实践指南与专家共识[J]. 协和医学杂志, 2019, 10: 403-408. DOI: 10.3969/j.issn.1674-9081.2019.04.018
    [20]
    北京协和医院罕见病多学科协作组, 中国罕见病联盟. 氯巴占治疗难治性癫痫专家共识(2022)[J]. 协和医学杂志, 2022, 13: 768-782. DOI: 10.12290/xhyxzz.2022-0421
    [21]
    BS PD ISO/IEC TR 29119-11: 2020, Software and systems engineering: Software testing— Part 11: Guidelines on the testing of AI-based systems[EB/OL]. [2023-03-18]. https://www.iso.org/obp/ui/#iso:std:iso-iec:tr:29119:-11:ed-1:v1:en.
    [22]
    Wang L, Wang H, Xia C, et al. Toward standardized premarket evaluation of computer aided diagnosis/detection products: insights from FDA-approved products[J]. Expert Rev Med Devices, 2020, 17: 899-918. DOI: 10.1080/17434440.2020.1813566
    [23]
    Wang H, Meng X, Zhang C, et al. Performance Assess-ment of Artificial Intelligence Medical Device Software Using Synthetic Data[C]. 2021 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2021: 444-448.
    [24]
    Hess DE, Roddy RF, Faller W. Uncertainty analysis applied to feedforward neural networks[J]. Ship Technol Res, 2007, 54: 114-124. DOI: 10.1179/str.2007.54.3.003
    [25]
    Choi JY, Choi CH. Sensitivity analysis of multilayer perceptron with differentiable activation functions[J]. IEEE Trans Neural Netw, 1992, 3: 101-107. DOI: 10.1109/72.105422
    [26]
    IEEE. IEEE Recommended Practice for the Quality Management of Datasets for Medical Artificial Intelligence[J]. IEEE, 2022. doi: 10.1109/IEEESTD.2022.9812564.

Catalog

    Article Metrics

    Article views (1473) PDF downloads (1119) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    x Close Forever Close