TY - JOUR
T1 - Enrichment sampling for a multi-site patient survey using electronic health records and census data
AU - Mercaldo, Nathaniel D.
AU - Brothers, Kyle B.
AU - Carrell, David S.
AU - Clayton, Ellen W.
AU - Connolly, John J.
AU - Holm, Ingrid A.
AU - Horowitz, Carol R.
AU - Jarvik, Gail P.
AU - Kitchner, Terrie E.
AU - Li, Rongling
AU - McCarty, Catherine A.
AU - McCormick, Jennifer B.
AU - McManus, Valerie D.
AU - Myers, Melanie F.
AU - Pankratz, Joshua J.
AU - Shrubsole, Martha J.
AU - Smith, Maureen E.
AU - Stallings, Sarah C.
AU - Williams, Janet L.
AU - Schildcrout, Jonathan S.
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.
AB - Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.
UR - http://www.scopus.com/inward/record.url?scp=85060814318&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060814318&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocy164
DO - 10.1093/jamia/ocy164
M3 - Article
C2 - 30590688
AN - SCOPUS:85060814318
SN - 1067-5027
VL - 26
SP - 219
EP - 227
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -