TY - JOUR
T1 - Translating genome wide association study results to associations among common diseases
T2 - In silico study with an electronic medical record
AU - Anand, Vibha
AU - Rosenman, Marc B.
AU - Downs, Stephen M.
PY - 2013/9
Y1 - 2013/9
N2 - Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.
AB - Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.
KW - Bayesian network
KW - Bioinformatics
KW - Data mining
KW - Electronic medical records (EMRs)
KW - Genome Wide Association Studies (GWAS)
KW - In silico
KW - Integration
KW - Single nucleotide polymorphisms (SNPs)
UR - http://www.scopus.com/inward/record.url?scp=84881370708&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881370708&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2013.05.003
DO - 10.1016/j.ijmedinf.2013.05.003
M3 - Article
C2 - 23743324
AN - SCOPUS:84881370708
SN - 1386-5056
VL - 82
SP - 864
EP - 874
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
IS - 9
ER -