TY - JOUR
T1 - A machine learning-based phenotype for long COVID in children
T2 - An EHR-based study from the RECOVER program
AU - Lorman, Vitaly
AU - Razzaghi, Hanieh
AU - Song, Xing
AU - Morse, Keith
AU - Utidjian, Levon
AU - Allen, Andrea J.
AU - Rao, Suchitra
AU - Rogerson, Colin
AU - Bennett, Tellen D.
AU - Morizono, Hiroki
AU - Eckrich, Daniel
AU - Jhaveri, Ravi
AU - Huang, Yungui
AU - Ranade, Daksha
AU - Pajor, Nathan
AU - Lee, Grace M.
AU - Forrest, Christopher B.
AU - Bailey, L. Charles
N1 - Publisher Copyright:
© 2023 Lorman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2023/8
Y1 - 2023/8
N2 - As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.
AB - As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.
UR - http://www.scopus.com/inward/record.url?scp=85167657272&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85167657272&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0289774
DO - 10.1371/journal.pone.0289774
M3 - Article
C2 - 37561683
AN - SCOPUS:85167657272
SN - 1932-6203
VL - 18
JO - PloS one
JF - PloS one
IS - 8 August
M1 - e0289774
ER -