Abstract
The presence of missing data at the time of prediction limits the application of risk models in clinical and research settings. Common ways of handling missing data at the time of prediction include measuring the missing value and employing statistical methods. Measuring missing value incurs additional cost, whereas previously reported statistical methods results in reduced performance compared to when all variables are measured. To tackle these challenges, we introduce a new strategy, the MMTOP algorithm (Multiple models for Missing values at Time Of Prediction), which does not require measuring additional data elements or data imputation. Specifically, at model construction time, the MMTOP constructs multiple predictively equivalent risk models utilizing different risk factor sets. The collection of models are stored and to be queried at prediction time. To predict an individual's risk in the presence of incomplete data, the MMTOP selects the risk model based on measurement availability for that individual from the collection of predictively equivalent models and makes the risk prediction with the selected model. We illustrate the MMTOP with severe hypoglycemia (SH) risk prediction based on data from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) study. We identified 77 predictively equivalent models for SH with cross-validated c-index of 0.77 ± 0.03. These models are based on 77 distinct risk factor sets containing 12–17 risk factors. In terms of handling missing data at the time of prediction, the MMTOP outperforms all four tested competitor methods and maintains consistent performance as the number of missing variables increase.
Original language | English (US) |
---|---|
Article number | 103379 |
Journal | Journal of Biomedical Informatics |
Volume | 103 |
DOIs | |
State | Published - Mar 2020 |
Funding
Contributions: SM, ERS, PJS, LSC designed the study. SM and MU performed the data analysis. SM, RZ, ERS, PJS, LSC all critically reviewed the manuscript and provided intellectual content and feedback. All authors reviewed the manuscript before submission. SM and LSC is the guarantor of the study and takes responsibility for the contents of the article. Part of the results of the current manuscript was published in abstract form in 78th American Diabetes Association scientific sessions [61]. We thank the editor, the reviewers, and our colleagues Erich Kummerfeld and Gyorgy Simon for their critical review and constructive feedback of the paper. Their contributions are instrumental in improving the clarity and scientific rigor of the paper. The study is supported by funding from the University of Minnesota, Academic Health Center. NIH grant NCRR 1UL1TR002494-01 partially supports Dr.Ma's time on this projects. Dr. Ma also receives funding from NIH grant 1R01MH116156-01A1, 1R03MH117254-01, 1U79SM080049-01 during the period of this study. The ACCORD study was supported by contracts from the National Heart Lung and Blood Institute (N01-HC-95178, N01-HC-95179, N01-HC-95180,N01-HC-95181, N01-HC-95182, N01-HC-95183, N01-HC-95184, IAA-Y1-HC-9035, and IAA-Y1-HC-1010), by other components of the National Institutes of Health\u2014including the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute on Aging, and the National Eye Institute\u2014by the Centers for Disease Control and Prevention, and by General Clinical Research Centers. The following companies provided study medications, equipment, or supplies: Abbott Laboratories, Amylin Pharmaceutical, AstraZeneca, Bayer HealthCare, Closer Healthcare, GlaxoSmithKline, King Pharmaceuticals, Merck, Novartis, Novo Nordisk, Omron Healthcare, Sanofi-Aventis, and Schering-Plough. This Manuscript was prepared using ACCORD Research Materials obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the ACCORD or the NHLBI. The authors thank the staff and participants of the ACCORD study for their important contributions. The ACCORD study was supported by contracts from the National Heart Lung and Blood Institute (N01-HC-95178, N01-HC-95179, N01-HC-95180,N01-HC-95181, N01-HC-95182, N01-HC-95183, N01-HC-95184, IAA-Y1-HC-9035, and IAA-Y1-HC-1010), by other components of the National Institutes of Health\u2014including the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute on Aging, and the National Eye Institute\u2014by the Centers for Disease Control and Prevention, and by General Clinical Research Centers. The following companies provided study medications, equipment, or supplies: Abbott Laboratories, Amylin Pharmaceutical, AstraZeneca, Bayer HealthCare, Closer Healthcare, GlaxoSmithKline, King Pharmaceuticals, Merck, Novartis, Novo Nordisk, Omron Healthcare, Sanofi-Aventis, and Schering-Plough. The study is supported by funding from the University of Minnesota, Academic Health Center . NIH grant NCRR 1UL1TR002494-01 partially supports Dr.Ma\u2019s time on this projects. Dr. Ma also receives funding from NIH grant 1R01MH116156-01A1, 1R03MH117254-01, 1U79SM080049-01 during the period of this study.
Keywords
- Missing data
- Risk factors
- Risk modeling
- T2DM
ASJC Scopus subject areas
- Health Informatics
- Computer Science Applications