TY - JOUR
T1 - A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts
AU - Lindström, Sara
AU - Loomis, Stephanie
AU - Turman, Constance
AU - Huang, Hongyan
AU - Huang, Jinyan
AU - Aschard, Hugues
AU - Chan, Andrew T.
AU - Choi, Hyon
AU - Cornelis, Marilyn
AU - Curhan, Gary
AU - De Vivo, Immaculata
AU - Eliassen, A. Heather
AU - Fuchs, Charles
AU - Gaziano, Michael
AU - Hankinson, Susan E.
AU - Hu, Frank
AU - Jensen, Majken
AU - Kang, Jae H.
AU - Kabrhel, Christopher
AU - Liang, Liming
AU - Pasquale, Louis R.
AU - Rimm, Eric
AU - Stampfer, Meir J.
AU - Tamimi, Rulla M.
AU - Tworoger, Shelley S.
AU - Wiggs, Janey L.
AU - Hunter, David J.
AU - Kraft, Peter
N1 - Publisher Copyright:
© 2017 Lindström et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2017/3
Y1 - 2017/3
N2 - The Nurses' Health Study (NHS), Nurses' Health Study II (NHSII), Health Professionals Follow Up Study (HPFS) and the Physicians Health Study (PHS) have collected detailed longitudinal data on multiple exposures and traits for approximately 310,000 study participants over the last 35 years. Over 160,000 study participants across the cohorts have donated a DNA sample and to date, 20,691 subjects have been genotyped as part of genome-wide association studies (GWAS) of twelve primary outcomes. However, these studies utilized six different GWAS arrays making it difficult to conduct analyses of secondary phenotypes or share controls across studies. To allow for secondary analyses of these data, we have created three new datasets merged by platform family and performed imputation using a common reference panel, the 1,000 Genomes Phase I release. Here, we describe the methodology behind the data merging and imputation and present imputation quality statistics and association results from two GWAS of secondary phenotypes (body mass index (BMI) and venous thromboembolism (VTE)). We observed the strongest BMI association for the FTO SNP rs55872725 (β = 0.45, p = 3.48x10-22), and using a significance level of p = 0.05, we replicated 19 out of 32 known BMI SNPs. For VTE, we observed the strongest association for the rs2040445 SNP (OR = 2.17, 95% CI: 1.79-2.63, p = 2.70x10-15), located downstream of F5 and also observed significant associations for the known ABO and F11 regions. This pooled resource can be used to maximize power in GWAS of phenotypes collected across the cohorts and for studying gene-environment interactions as well as rare phenotypes and genotypes.
AB - The Nurses' Health Study (NHS), Nurses' Health Study II (NHSII), Health Professionals Follow Up Study (HPFS) and the Physicians Health Study (PHS) have collected detailed longitudinal data on multiple exposures and traits for approximately 310,000 study participants over the last 35 years. Over 160,000 study participants across the cohorts have donated a DNA sample and to date, 20,691 subjects have been genotyped as part of genome-wide association studies (GWAS) of twelve primary outcomes. However, these studies utilized six different GWAS arrays making it difficult to conduct analyses of secondary phenotypes or share controls across studies. To allow for secondary analyses of these data, we have created three new datasets merged by platform family and performed imputation using a common reference panel, the 1,000 Genomes Phase I release. Here, we describe the methodology behind the data merging and imputation and present imputation quality statistics and association results from two GWAS of secondary phenotypes (body mass index (BMI) and venous thromboembolism (VTE)). We observed the strongest BMI association for the FTO SNP rs55872725 (β = 0.45, p = 3.48x10-22), and using a significance level of p = 0.05, we replicated 19 out of 32 known BMI SNPs. For VTE, we observed the strongest association for the rs2040445 SNP (OR = 2.17, 95% CI: 1.79-2.63, p = 2.70x10-15), located downstream of F5 and also observed significant associations for the known ABO and F11 regions. This pooled resource can be used to maximize power in GWAS of phenotypes collected across the cohorts and for studying gene-environment interactions as well as rare phenotypes and genotypes.
UR - http://www.scopus.com/inward/record.url?scp=85015615777&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015615777&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0173997
DO - 10.1371/journal.pone.0173997
M3 - Review article
C2 - 28301549
AN - SCOPUS:85015615777
SN - 1932-6203
VL - 12
JO - PLoS One
JF - PLoS One
IS - 3
M1 - e0173997
ER -