A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework

The Consortium for Clinical Characterization of COVI by EHR (4CE

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis. In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection. We implemented distributed representation learning powered by the Machine Learning for modeling Health Outcomes (MLHO) to identify novel EHR features that could suggest PASC symptoms outside of typical diagnosis codes. MLHO applies an entropy-based feature selection and boosting algorithms for representation mining. These improved definitions were then used for estimating PASC among hospitalized patients. 30,422 hospitalized patients were diagnosed with COVID-19 across three healthcare systems between March 13, 2020 and February 28, 2021. The mean age of the population was 62.3 years (SD, 21.0 years) and 15,124 (49.7%) were female. We implemented the distributed representation learning technique to augment PASC definitions. These definitions were found to have positive predictive values of 0.73, 0.74, and 0.91 for dyspnea, fatigue, and joint pain, respectively. We estimated that 25 percent (CI 95%: 6–48), 11 percent (CI 95%: 6–15), and 13 percent (CI 95%: 8–17) of hospitalized COVID-19 patients will have dyspnea, fatigue, and joint pain, respectively, 3 months or longer after a COVID-19 diagnosis. We present a validated framework for screening and identifying patients with PASC in the EHR and then use the tool to estimate its prevalence among hospitalized COVID-19 patients.

Original languageEnglish (US)
Article numbere0000301
JournalPLOS Digital Health
Volume2
Issue number7 July
DOIs
StatePublished - Jul 2023

Funding

ZSH is supported by National Institutes of Health (NIH) National Library of Medicine (NLM) T15 LM007092. JK is supported by National Center for Advancing Translational Sciences (NCATS) UL1TR001857. KBW is supported by NIH R01 HL151643. YL is supported by NCATS U01TR003528 and NLM 1R01LM013337. DWH is supported by NCATS UL1 TR001998. GSO is supported by NIH grants P30ES017885 and U24CA210967. ZX is supported by National Institute of Neurological Disorders and Stroke (NINDS) R01NS098023 and NINDS R01NS124882. JHH is supported by NCATS UL1-TR001878. HE is supported by National Institute of Allergy and Infectious Diseases (NIAID) R01AI165535 and National Institute on Aging (NIA) RF1AG074372. SM is supported by NCATS UL1TR001857. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'A retrospective cohort analysis leveraging augmented intelligence to characterize long COVID in the electronic health record: A precision medicine framework'. Together they form a unique fingerprint.

Cite this