Abstract
Objective: High BMI is associated with many comorbidities and mortality. This study aimed to elucidate the overall clinical risk of obesity using a genome- and phenome-wide approach. Methods: This study performed a phenome-wide association study of BMI using a clinical cohort of 736,726 adults. This was followed by genetic association studies using two separate cohorts: one consisting of 65,174 adults in the Electronic Medical Records and Genomics (eMERGE) Network and another with 405,432 participants in the UK Biobank. Results: Class 3 obesity was associated with 433 phenotypes, representing 59.3% of all billing codes in individuals with severe obesity. A genome-wide polygenic risk score for BMI, accounting for 7.5% of variance in BMI, was associated with 296 clinical diseases, including strong associations with type 2 diabetes, sleep apnea, hypertension, and chronic liver disease. In all three cohorts, 199 phenotypes were associated with class 3 obesity and polygenic risk for obesity, including novel associations such as increased risk of renal failure, venous insufficiency, and gastroesophageal reflux. Conclusions: This combined genomic and phenomic systematic approach demonstrated that obesity has a strong genetic predisposition and is associated with a considerable burden of disease across all disease classes.
Original language | English (US) |
---|---|
Pages (from-to) | 2477-2488 |
Number of pages | 12 |
Journal | Obesity |
Volume | 30 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2022 |
Funding
Jamie R. Robinson received support by the 5T15LM007450 training grant from the National Library of Medicine. Support for the research and personnel was provided by the R01LM010685 grant from the National Library of Medicine and R01GM114128 from the National Institutes of Health (NIH). The Electronic Medical Records and Genomics (eMERGE) sites were funded through several series of grants from the National Human Genome Research Institute: U01HG8657, U01HG006375, and U01HG004610 (Kaiser Permanente Washington/University of Washington); U01HG8685 (Brigham and Women's Hospital); U01HG8672, U01HG006378, and U01HG004608 (Vanderbilt University Medical Center); U01HG8666 and U01HG006828 (Cincinnati Children's Hospital Medical Center); U01HG6379 and U01HG04599 (Mayo Clinic); U01HG8679 and U01HG006382 (Geisinger Clinic); U01HG008680 (Columbia University Health Sciences); U01HG8684 and U01HG006830 (Children's Hospital of Philadelphia); U01HG8673, U01HG006388, and U01HG004609 (Northwestern University); U01HG8676 (Partners Healthcare/Broad Institute); U01HG8664 (Baylor College of Medicine); U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University); U01HG006380 (Icahn School of Medicine at Mount Sinai); and U01HG8701, U01HG006385, and U01HG04603 (Vanderbilt University Medical Center serving as the Coordinating Center); eMERGE Genotyping Centers were also funded through U01HG004438 (Center for Inherited Disease Research [CIDR]) and U01HG004424 (the Broad Institute). Vanderbilt University Medical Center's Synthetic Derivative and BioVU are supported by institutional funding and by the Clinical and Translational Science Awards (CTSA) grant ULTR000445 from the National Center for Advancing Translational Sciences (NCATS)/NIH. Amit V. Khera received support from grants 1K08HG010155 and 5UM1HG008895 from the National Human Genome Research Institute, a Hassenfeld Scholar Award from Massachusetts General Hospital, a Merkin Institute Fellowship from the Broad Institute of Massachusetts Institute of Technology and Harvard University, and a sponsored research agreement from IBM Research. The vast majority of Joshua C. Denny's work for this project occurred while he was on faculty at Vanderbilt University before joining the NIH. Data for the genomic analyses were derived from two separate cohorts, the first being the Electronic Medical Records and Genomics (eMERGE) Network, a national network organized and funded by the National Human Genome Research Institute (NHGRI) [36]. eMERGE combines DNA biorepositories with EHRs for large-scale, high-throughput genetic research. Both the genomic and phenomic data (ICD-9-CM diagnosis codes and demographics) were coalesced into a central repository. Of the eMERGE cohort, a total of 19,590 (30.1%) individuals were from VUMC and they likely also contributed data to the clinical cohort, although deidentification prevents the ability to confirm overlap. The second genomic cohort was derived from a maximal subset of unrelated UK Biobank participants with both genomic and phenomic data available. We conducted a retrospective observational study using the Vanderbilt University Medical Center (VUMC) Synthetic Derivative, a deidentified version of more than 3 million VUMC patient health records dating back several decades, depending on data type [30–32]. The primary site study population consisted of all adult individuals (≥18 years of age) with at least one documented BMI value. The study protocol was designated as nonhuman subject research by the Institutional Review Board at VUMC. BMI was calculated as weight in kilograms divided by height in meters squared, in which both weight and height were measured at a single encounter. All measured BMI values were extracted for each adult individual (9,573,624 BMI observations), with BMI data obtained during pregnancy (649,442 observations) or with clinically implausible values (less than 10 or greater than 70 kg/m2; 6316 observations) excluded. The median BMI for each individual was classified into one of six BMI categories, as defined by the Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO), including underweight (<18.5), normal (18.5-24.9), overweight (25.0-29.9), and obesity class 1 (30.0-34.9), class 2 (35.0-39.9), and class 3 (≥40.0) [33]. All distinct International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes from each individual's record were captured and translated into PheWAS codes (phecodes), a hierarchical classification system for ICD-9-CM codes [28,34]. A minimum of two instances of a matching ICD-9-CM code on separate days was required to be translated to a phecode using PheWAS code map version 1.2. As many phenotypes occur rarely, we analyzed only those that occurred in a minimum of 20 cases for all PheWAS analyses in this study. The PheWAS was performed using logistic regression models adjusted for age at last BMI value recorded, sex, and self-reported race to determine the association of BMI categories with phenotypes. Using categorical BMI, effect sizes were determined by comparison with those individuals with BMI values in the normal range. We also used mean BMI as the predictor in the PheWAS model to calculate effect sizes per standard deviation (SD) difference in BMI. All PheWAS analyses were performed using the PheWAS package for R statistical software, version 3.4.3 and using PheWAS code map version 1.2 [35]. Two-sided p < 5.6 × 10−6 was considered statistically significant using Bonferroni correction for multiple comparisons. Data for the genomic analyses were derived from two separate cohorts, the first being the Electronic Medical Records and Genomics (eMERGE) Network, a national network organized and funded by the National Human Genome Research Institute (NHGRI) [36]. eMERGE combines DNA biorepositories with EHRs for large-scale, high-throughput genetic research. Both the genomic and phenomic data (ICD-9-CM diagnosis codes and demographics) were coalesced into a central repository. Of the eMERGE cohort, a total of 19,590 (30.1%) individuals were from VUMC and they likely also contributed data to the clinical cohort, although deidentification prevents the ability to confirm overlap. The second genomic cohort was derived from a maximal subset of unrelated UK Biobank participants with both genomic and phenomic data available. The eMERGE population in this study consisted of individuals from institutions contributing data to the eMERGE Network phases I through III (Supporting Information Table S1). Inclusion criteria were age ≥ 18 years with extant genome-wide genotyping data and ICD-9-CM codes. The eMERGE Network has unified genetic results from 12 different sites across 78 genotype array batches through imputation using the Michigan Imputation Server [37] and Haplotype Reference Consortium (HRC1.1) [38]. This pipeline has resulted in an imputed genome-wide set of approximately 40 million single-nucleotide variant marker allele doses down to 0.1% minor allele frequency. Genotype array files were referenced to build 37 genome positions using the forward genome strand. Quality control included filtering for sample missingness < 2.0% and SNP missingness < 2.0% in the preprocessing of the data before imputation. For duplicate samples on differing arrays, the sample with the most genotyped variants for that participant was selected for the merged data set. Principal component analysis using the first 10 principal components was performed to determine genetic ancestry using PLINK [39], with variants having >5% minor allele frequency. Single-nucleotide variants with a missing rate > 10% or those not meeting the linkage disequilibrium threshold of r2 < 0.7 were excluded in principal component analyses. We performed identity by descent (IBD) analysis to identify related individuals. One individual from suspected monozygotic twins or duplicates was excluded randomly. Participant relatedness was determined using probability of zero alleles IBD (Z0) < 0.83 and probability of having one allele IBD (Z1) > 0.10 to capture first- through third-degree relatives. The oldest family member from each family was included in the cohort analysis. Minimum mean imputation r2 was 0.83, with a mean r2 of 0.95 across imputed SNPs. A maximal subset of unrelated UK Biobank participants after application of quality control was selected, as detailed in Bycroft et al. supplemental section 3.3.2 [40]. Individuals in this subset were chosen to have no other related individuals within three degrees within the subset, to have genotyping missingness < 2%, to have no mismatch between genetically inferred and reported sex, and to not be outliers for heterozygosity or genotype missingness. We additionally removed individuals without a BMI measurement at the time of enrollment, as well as those who revoked consent after enrollment. This left 405,432 UK Biobank participants for analysis. The limited PRS in each genomic cohort was calculated from 97 SNPs (Supporting Information Table S3) associated with BMI at genome-wide significance in a prior meta-analysis of GWAS conducted by the Genetic Investigation of ANthropometric Traits (GIANT) Consortium [11]. The 97-SNP PRS was computed for each participant by multiplying the effect estimate at each allele by the genetic dosage of the effect allele, summing the values across all SNPs for each participant. Within the eMERGE cohort, we also calculated a 941-SNP PRS using data from a meta-analysis of 681,275 individuals from GWAS analyses in the GIANT Consortium studies and the UK Biobank [12]. The genome-wide PRS was computed for each participant with the same procedure, using the best-performing LDpred-adjusted values (from a model built assuming that 3% of variants are causal, constructed with 2,100,302 variants), as described previously [10]. In this approach, each variant's posterior mean effect is calculated based on the prior effect estimate and a shrinkage based on the variant's correlation structure with other variants from the reference population [41]. As all sets were imputed to the same reference standard, we were able to use all SNPs for calculation of the genome-wide PRS. The BMI variance explained (adjusted R2) by the associated SNPs was calculated with individual-level genotype and phenotype data using linear regression models adjusted for site, age, sex, and the first 10 principal components. To calculate effect estimates for genetically determined BMI on disease phenotypes, a PheWAS was performed, as described earlier, using logistic regression models adjusted for site, age, sex, and the first 10 principal components. For phenotypes already passing a Bonferroni significance threshold for association with class 3 obesity in the primary cohort, a false discovery rate significance threshold < 0.05 was used to assess for replication of obesity associations with the genomic score. Effect estimates for the 97-SNP score are reported per SD difference in BMI (derived from β estimates and SD of 4.8 kg/m2 in a prior GIANT cohort of 449,472 individuals). In the eMERGE cohort, we performed association analyses using both the 97-SNP PRS and a 941-SNP PRS to demonstrate the improvement of association results with increasing quantities of SNPs included in the PRS. The 941-SNP PRS was scaled to a mean of 0 and SD of 1 prior to PheWAS analysis. This analysis was not performed in the UK Biobank cohort to reduce bias owing to sample overlap. We similarly performed a PheWAS using the genome-wide PRS, which was scaled to a mean of 0 and SD of 1 prior to analysis. Effect estimates were compared using correlation coefficient analysis to determine the similarity between clinical and genomic effect sizes. ICD-10 codes for hospitalizations were reported by the UK Biobank. These were translated into phecodes using mappings described previously [42]. For each phecode, a logistic regression model was computed, predicting the presence or absence of a phecode as a function of the PRS, sex, age at enrollment, the UK Biobank genotyping array, and the first 10 principal components of ancestry. For phenotypes already passing a Bonferroni significance threshold for association with class 3 obesity in the primary cohort, as well as those showing a significant association in the eMERGE genomic cohort, a false discovery rate significance threshold < 0.05 was used to assess for replication of obesity associations with the risk score. These models were computed separately for the 97-SNP score and the genome-wide PRS. PRS was scaled to a mean of 0 and SD of 1 prior to PheWAS analysis. Effect estimates between cohorts were compared using Pearson correlation analysis. We further completed the genome-wide analysis with exclusion of Phase 1 of the UK Biobank cohort to limit any overfitting of the model using LDpred-calculated values. Jamie R. Robinson received support by the 5T15LM007450 training grant from the National Library of Medicine. Support for the research and personnel was provided by the R01LM010685 grant from the National Library of Medicine and R01GM114128 from the National Institutes of Health (NIH). The Electronic Medical Records and Genomics (eMERGE) sites were funded through several series of grants from the National Human Genome Research Institute: U01HG8657, U01HG006375, and U01HG004610 (Kaiser Permanente Washington/University of Washington); U01HG8685 (Brigham and Women's Hospital); U01HG8672, U01HG006378, and U01HG004608 (Vanderbilt University Medical Center); U01HG8666 and U01HG006828 (Cincinnati Children's Hospital Medical Center); U01HG6379 and U01HG04599 (Mayo Clinic); U01HG8679 and U01HG006382 (Geisinger Clinic); U01HG008680 (Columbia University Health Sciences); U01HG8684 and U01HG006830 (Children's Hospital of Philadelphia); U01HG8673, U01HG006388, and U01HG004609 (Northwestern University); U01HG8676 (Partners Healthcare/Broad Institute); U01HG8664 (Baylor College of Medicine); U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University); U01HG006380 (Icahn School of Medicine at Mount Sinai); and U01HG8701, U01HG006385, and U01HG04603 (Vanderbilt University Medical Center serving as the Coordinating Center); eMERGE Genotyping Centers were also funded through U01HG004438 (Center for Inherited Disease Research [CIDR]) and U01HG004424 (the Broad Institute). Vanderbilt University Medical Center's Synthetic Derivative and BioVU are supported by institutional funding and by the Clinical and Translational Science Awards (CTSA) grant ULTR000445 from the National Center for Advancing Translational Sciences (NCATS)/NIH. Amit V. Khera received support from grants 1K08HG010155 and 5UM1HG008895 from the National Human Genome Research Institute, a Hassenfeld Scholar Award from Massachusetts General Hospital, a Merkin Institute Fellowship from the Broad Institute of Massachusetts Institute of Technology and Harvard University, and a sponsored research agreement from IBM Research. The vast majority of Joshua C. Denny's work for this project occurred while he was on faculty at Vanderbilt University before joining the NIH.
ASJC Scopus subject areas
- Medicine (miscellaneous)
- Endocrinology, Diabetes and Metabolism
- Endocrinology
- Nutrition and Dietetics