医疗大数据的“欺骗性”及其对策

Medical Big Data "Deception" and Strategies

摘要: 当前，针对医疗大数据的研究和应用越来越广泛，但毋庸置疑，医疗大数据本身具有一定欺骗性，在某些特殊场景下，可能会产生错误的结论和影响。本文从数据本身的欺骗性以及机器学习可能存在的陷阱展开，对医疗大数据产生欺骗性的原因进行分析；针对医疗大数据的欺骗性，从统计学角度阐述如何避免大数据陷阱；从模型角度分析模型被攻击的应对策略以及模型可解释性在医疗领域的重要性和方法。

Abstract: At present, research and application of medical big data are more and more extensive. But inevitably, medical big data is of some deception, and in many scenarios, it can result in wrong conclusions and influence. In this paper, firstly we analyze the causes of medical big data deception from the data deception per se and pitfalls of machine learning. Then, we introduce how to avoid data pitfalls in statistics and analyze the strategies to tackle attacks on models. The importance and methods achieving model interpretability in the medical area are also mentioned.