Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study

Abel N. Kho*, M. Geoffrey Hayes, Laura Rasmussen-Torvik, Jennifer A. Pacheco, William K. Thompson, Loren L. Armstrong, Joshua C. Denny, Peggy L. Peissig, Aaron W. Miller, Wei Qi Wei, Suzette J. Bielinski, Christopher G. Chute, Cynthia L. Leibson, Gail P. Jarvik, David R. Crosslin, Christopher S. Carlson, Katherine M. Newton, Wendy A. Wolf, Rex L. Chisholm, William L. Lowe

*Corresponding author for this work

Research output: Contribution to journalArticle

154 Citations (Scopus)

Abstract

Objective: Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. Materials and Methods: An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. Results: The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. Discussion: By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. Conclusions: An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.

Original languageEnglish (US)
Pages (from-to)212-218
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume19
Issue number2
DOIs
StatePublished - Mar 1 2012

Fingerprint

Electronic Health Records
Genome-Wide Association Study
Type 2 Diabetes Mellitus
Clinical Laboratory Techniques
Genetic Association Studies
Terminology
Meta-Analysis
Case-Control Studies
Genes

ASJC Scopus subject areas

  • Health Informatics

Cite this

Kho, Abel N. ; Hayes, M. Geoffrey ; Rasmussen-Torvik, Laura ; Pacheco, Jennifer A. ; Thompson, William K. ; Armstrong, Loren L. ; Denny, Joshua C. ; Peissig, Peggy L. ; Miller, Aaron W. ; Wei, Wei Qi ; Bielinski, Suzette J. ; Chute, Christopher G. ; Leibson, Cynthia L. ; Jarvik, Gail P. ; Crosslin, David R. ; Carlson, Christopher S. ; Newton, Katherine M. ; Wolf, Wendy A. ; Chisholm, Rex L. ; Lowe, William L. / Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. In: Journal of the American Medical Informatics Association. 2012 ; Vol. 19, No. 2. pp. 212-218.
@article{6002fb945153462fa8b8ce28dd5e5ec2,
title = "Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study",
abstract = "Objective: Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. Materials and Methods: An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. Results: The algorithm achieved 98{\%} and 100{\%} positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. Discussion: By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. Conclusions: An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.",
author = "Kho, {Abel N.} and Hayes, {M. Geoffrey} and Laura Rasmussen-Torvik and Pacheco, {Jennifer A.} and Thompson, {William K.} and Armstrong, {Loren L.} and Denny, {Joshua C.} and Peissig, {Peggy L.} and Miller, {Aaron W.} and Wei, {Wei Qi} and Bielinski, {Suzette J.} and Chute, {Christopher G.} and Leibson, {Cynthia L.} and Jarvik, {Gail P.} and Crosslin, {David R.} and Carlson, {Christopher S.} and Newton, {Katherine M.} and Wolf, {Wendy A.} and Chisholm, {Rex L.} and Lowe, {William L.}",
year = "2012",
month = "3",
day = "1",
doi = "10.1136/amiajnl-2011-000439",
language = "English (US)",
volume = "19",
pages = "212--218",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

Kho, AN, Hayes, MG, Rasmussen-Torvik, L, Pacheco, JA, Thompson, WK, Armstrong, LL, Denny, JC, Peissig, PL, Miller, AW, Wei, WQ, Bielinski, SJ, Chute, CG, Leibson, CL, Jarvik, GP, Crosslin, DR, Carlson, CS, Newton, KM, Wolf, WA, Chisholm, RL & Lowe, WL 2012, 'Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study', Journal of the American Medical Informatics Association, vol. 19, no. 2, pp. 212-218. https://doi.org/10.1136/amiajnl-2011-000439

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. / Kho, Abel N.; Hayes, M. Geoffrey; Rasmussen-Torvik, Laura; Pacheco, Jennifer A.; Thompson, William K.; Armstrong, Loren L.; Denny, Joshua C.; Peissig, Peggy L.; Miller, Aaron W.; Wei, Wei Qi; Bielinski, Suzette J.; Chute, Christopher G.; Leibson, Cynthia L.; Jarvik, Gail P.; Crosslin, David R.; Carlson, Christopher S.; Newton, Katherine M.; Wolf, Wendy A.; Chisholm, Rex L.; Lowe, William L.

In: Journal of the American Medical Informatics Association, Vol. 19, No. 2, 01.03.2012, p. 212-218.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study

AU - Kho, Abel N.

AU - Hayes, M. Geoffrey

AU - Rasmussen-Torvik, Laura

AU - Pacheco, Jennifer A.

AU - Thompson, William K.

AU - Armstrong, Loren L.

AU - Denny, Joshua C.

AU - Peissig, Peggy L.

AU - Miller, Aaron W.

AU - Wei, Wei Qi

AU - Bielinski, Suzette J.

AU - Chute, Christopher G.

AU - Leibson, Cynthia L.

AU - Jarvik, Gail P.

AU - Crosslin, David R.

AU - Carlson, Christopher S.

AU - Newton, Katherine M.

AU - Wolf, Wendy A.

AU - Chisholm, Rex L.

AU - Lowe, William L.

PY - 2012/3/1

Y1 - 2012/3/1

N2 - Objective: Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. Materials and Methods: An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. Results: The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. Discussion: By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. Conclusions: An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.

AB - Objective: Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. Materials and Methods: An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. Results: The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. Discussion: By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. Conclusions: An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.

UR - http://www.scopus.com/inward/record.url?scp=84857146745&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857146745&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2011-000439

DO - 10.1136/amiajnl-2011-000439

M3 - Article

C2 - 22101970

AN - SCOPUS:84857146745

VL - 19

SP - 212

EP - 218

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 2

ER -