Evaluating the state of the art in missing data imputation for clinical data

Yuan Luo*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

29 Scopus citations


Clinical data are increasingly being mined to derive new medical knowledge with a goal of enabling greater diagnostic precision, better-personalized therapeutic regimens, improved clinical outcomes and more efficient utilization of health-care resources. However, clinical data are often only available at irregular intervals that vary between patients and type of data, with entries often being unmeasured or unknown. As a result, missing data often represent one of the major impediments to optimal knowledge derivation from clinical data. The Data Analytics Challenge on Missing data Imputation (DACMI) presented a shared clinical dataset with ground truth for evaluating and advancing the state of the art in imputing missing data for clinical time series. We extracted 13 commonly measured blood laboratory tests. To evaluate the imputation performance, we randomly removed one recorded result per laboratory test per patient admission and used them as the ground truth. DACMI is the first shared-task challenge on clinical time series imputation to our best knowledge. The challenge attracted 12 international teams spanning three continents across multiple industries and academia. The evaluation outcome suggests that competitive machine learning and statistical models (e.g. LightGBM, MICE and XGBoost) coupled with carefully engineered temporal and cross-sectional features can achieve strong imputation performance. However, care needs to be taken to prevent overblown model complexity. The challenge participating systems collectively experimented with a wide range of machine learning and probabilistic algorithms to combine temporal imputation and cross-sectional imputation, and their design principles will inform future efforts to better model clinical missing data.

Original languageEnglish (US)
Article numberbbab489
JournalBriefings in Bioinformatics
Issue number1
StatePublished - Jan 1 2022


  • clinical laboratory test
  • machine learning
  • missing data imputation
  • time series

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology


Dive into the research topics of 'Evaluating the state of the art in missing data imputation for clinical data'. Together they form a unique fingerprint.

Cite this