Abstract
Rare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Original language | English (US) |
---|---|
Pages (from-to) | 583-592 |
Number of pages | 10 |
Journal | Nature |
Volume | 631 |
Issue number | 8021 |
DOIs | |
State | Published - Jul 18 2024 |
Funding
This work was supported in part by the Intramural Research Program of the National Institute of Mental Health (ZIA-MH002843) and a grant from R01 NCI R01 CA157823. Ethical approval for the UK Biobank was previously obtained from the North West Centre for Research Ethics Committee (11/ NW/0382). The work described herein was approved by the UK Biobank under application number 26041. Informed consent was obtained for all study participants. Approval for Geisinger Health System MyCode analyses was provided by the Geisinger Health System Institutional Review Board under project number 2006-0258. Informed consent was obtained for all study participants. Appropriate consent for the Penn Medicine BioBank was obtained from each participant regarding the storage of biological specimens, genetic sequencing and genotyping and access to all available EHR data. This study was approved by the Institutional Review Board of the University of Pennsylvania and complied with the principles set out in the Declaration of Helsinki. All individuals participating in the Mayo\u2013RGC project generation provided informed consent for the use of specimens and data in genetic and health research and ethical approval for project generation was provided by the Mayo Clinic Institutional Review Board (09-007763). All research performed in this study used de-identified data (without any Protected Health Information data) with no possibility of re-identifying any of the participants. Approval for the Indiana Biobank was provided by the Indiana University Institutional Review Board under project number 1105005445. For participants in the Mexico City Prospective Study, approval for the study was given by the Mexican Ministry of Health, the Mexican National Council of Science and Technology (0595 P-M) and the Central Oxford Research Ethics Committee (C99.260) and the Ethics and Research commissions from the Medicine Faculty at the National Autonomous University of Mexico (UNAM) (FMED/CI/SPLR/067/2015). All study participants provided written informed consent. Study participants were recruited from the BioMe Biobank Program of the Charles Bronfman Institute for Personalized Medicine at Mount Sinai Medical Center from 2007 onward. The BioMe Biobank Program (Institutional Review Board 07-0529) operates under a Mount Sinai Institutional Review Board-approved research protocol. All study participants provided written informed consent. The authors thank the RGC-ME Cohort Partners for contributions to this initiative. A list of cohorts (with links to the projects where available) and the data contributors can be accessed from the RGC-ME browser ( https://rgc-research.regeneron.com/me/data-contributors ). The authors thank everyone who made this work possible, the professionals from the member institutions who contributed to and supported this work and, most especially, all of the participants, without whom this research would not be possible. This study is funded by Regeneron Genetics Center and Regeneron Pharmaceuticals.
ASJC Scopus subject areas
- General