TY - JOUR
T1 - Clinical, social, and policy factors in COVID-19 cases and deaths
T2 - methodological considerations for feature selection and modeling in county-level analyses
AU - Madlock-Brown, Charisse
AU - Wilkens, Ken
AU - Weiskopf, Nicole
AU - Cesare, Nina
AU - Bhattacharyya, Sharmodeep
AU - Riches, Naomi O.
AU - Espinoza, Juan
AU - Dorr, David
AU - Goetz, Kerry
AU - Phuong, Jimmy
AU - Sule, Anupam
AU - Kharrazi, Hadi
AU - Liu, Feifan
AU - Lemon, Cindy
AU - Adams, William G.
N1 - Funding Information:
We would like to extend our special appreciation of Harold Lehmann of Johns Hopkins University for his support in idea conception for this work. We would like to thank Tony Solomonides of Northshore University Health System and Daniella Meeker of the University of Southern California for helpful feedback. We also thank Dr. Nina Cesare and colleagues at the Boston University School of Public Health for access to the public dataset provided was created under the support of Sharecare through a partnership with Boston University’s School of Public Health.
Funding Information:
This work was partially funded by the National Center for Data to Health (CD2H) grant [NIH/NCATS U24TR002306] and supplemental funding from the National COVID Cohort Collaborative [NIH/NCATS U24TR002306-04S3]. Partial funding was supported in part by the National Institute of Diabetes and Digestive and Kidney Diseases Ruth L. Kirschstein National Research Service Award of the National Institutes of Health under award number [5T32DK110966-04]. Partial funding was supported in part by the Southern California and Translational Institute [NIH/NCATS UL1TR001855]. Partial funding was supported in part by the Boston University Clinical Translational Science Institute [NIH/NCATS 1UL1TR001430-01].
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. Methods: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. Results: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. Conclusion: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy.
AB - Background: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. Methods: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. Results: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. Conclusion: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy.
UR - http://www.scopus.com/inward/record.url?scp=85128258590&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128258590&partnerID=8YFLogxK
U2 - 10.1186/s12889-022-13168-y
DO - 10.1186/s12889-022-13168-y
M3 - Article
C2 - 35421958
AN - SCOPUS:85128258590
SN - 1471-2458
VL - 22
JO - BMC public health
JF - BMC public health
IS - 1
M1 - 747
ER -