TY - JOUR
T1 - Impact of different electronic cohort definitions to identify patients with atrial fibrillation from the electronic medical record
AU - Shah, Rashmee U.
AU - Mukherjee, Rebeka
AU - Zhang, Yue
AU - Jones, Aubrey E.
AU - Springer, Jennifer
AU - Hackett, Ian
AU - Steinberg, Benjamin A.
AU - Lloyd-Jones, Donald M.
AU - Chapman, Wendy W.
N1 - Funding Information:
B.A.S. receives research support from Boston Scientific and Janssen; provides consulting to Janssen, Bayer, and Merit Medical; and speaks for North American Center for Continuing Medical Education (funded by Sanofi). The remaining authors have no disclosures to report.
Funding Information:
This work and R.U.S. are supported by a grant from the National Heart, Lung, and Blood Institute of the National Institutes of Health (K08HL136850). B.A.S. is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (K23HL143156).
Publisher Copyright:
© 2020 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley.
PY - 2020
Y1 - 2020
N2 - Background-—Electronic medical records (EMRs) allow identification of disease-specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an electronic medical record–derived atrial fibrillation (AF) patient population using 5 different electronic cohort definitions. Methods and Results-—Adult patients with at least 1 AF billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having AF if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having AF varied from 61% in the model using diagnosis or procedure International Classification of Diseases (ICD) billing codes to 83% in the model using natural language processing of clinical notes. Among identified AF patients, the proportion of patients with a CHA2DS2-VASc score ≥2 varied from 69% to 85%; oral anticoagulant treatment rates varied from 50% to 66% depending on the model. Conclusions-—Different electronic cohort definitions result in substantially different AF study samples. This difference threatens the quality and reproducibility of electronic medical record–based research and quality initiatives.
AB - Background-—Electronic medical records (EMRs) allow identification of disease-specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an electronic medical record–derived atrial fibrillation (AF) patient population using 5 different electronic cohort definitions. Methods and Results-—Adult patients with at least 1 AF billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having AF if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having AF varied from 61% in the model using diagnosis or procedure International Classification of Diseases (ICD) billing codes to 83% in the model using natural language processing of clinical notes. Among identified AF patients, the proportion of patients with a CHA2DS2-VASc score ≥2 varied from 69% to 85%; oral anticoagulant treatment rates varied from 50% to 66% depending on the model. Conclusions-—Different electronic cohort definitions result in substantially different AF study samples. This difference threatens the quality and reproducibility of electronic medical record–based research and quality initiatives.
KW - Atrial fibrillation
KW - Electronic health records
KW - Health services research
KW - Informatics
KW - Quality of care
UR - http://www.scopus.com/inward/record.url?scp=85090263321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090263321&partnerID=8YFLogxK
U2 - 10.1161/JAHA.119.014527
DO - 10.1161/JAHA.119.014527
M3 - Article
C2 - 32098599
AN - SCOPUS:85090263321
VL - 9
JO - Journal of the American Heart Association
JF - Journal of the American Heart Association
SN - 2047-9980
IS - 5
M1 - e014527
ER -