A Genocentric Approach to Discovery of Mendelian Disorders

Task Force for Neonatal Genomics

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

The advent of inexpensive, clinical exome sequencing (ES) has led to the accumulation of genetic data from thousands of samples from individuals affected with a wide range of diseases, but for whom the underlying genetic and molecular etiology of their clinical phenotype remains unknown. In many cases, detailed phenotypes are unavailable or poorly recorded and there is little family history to guide study. To accelerate discovery, we integrated ES data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, in an Apache Hadoop data lake (Hadoop Architecture Lake of Exomes [HARLEE]) and implemented a genocentric analysis that rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. The approach did not rely on case-specific phenotypic classifications but was driven by optimization of gene- and variant-level filter parameters utilizing historical Mendelian disease-gene association discovery data. Variants in 19 of the 154 candidate genes were subsequently reported as causative of a Mendelian trait and additional data support the association of all other candidate genes with disease endpoints.

Original languageEnglish (US)
Pages (from-to)974-986
Number of pages13
JournalAmerican journal of human genetics
Volume105
Issue number5
DOIs
StatePublished - Nov 7 2019

Keywords

  • HARLEE
  • Hadoop
  • Mendelian disease
  • big data
  • clan genomics
  • data lake
  • developmental disorder
  • genotype-first
  • ultra-rare
  • whole-exome sequencing

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'A Genocentric Approach to Discovery of Mendelian Disorders'. Together they form a unique fingerprint.

Cite this