Mining Epistatic Interactions from High-Dimensional Data Sets

Xia Jiang*, Shyam Visweswaran, Richard E. Neapolitan

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingChapter

    1 Scopus citations

    Abstract

    Genetic epidemiologists strive to determine the genetic profile of diseases. Two or more genes can interact to have a causal effect on disease even when little or no such effect can be observed statistically for one or even both of the genes individually. This is in contrast to Mendelian diseases like cystic fibrosis, which are associated with variation at a single genetic locus. This gene-gene interaction is called epistasis. To uncover this dark matter of genetic risk it would be pivotal to be able to discover epistatic relationships from data. The recent availability of high-dimensional data sets affords us unprecedented opportunity to make headway in accomplishing this. However, there are two central barriers to successfully identifying genetic interactions using such data sets. First, it is difficult to detect epistatic interactions statistically using parametric statistical methods such as logistic regression due to the sparseness of the data and the non-linearity of the relationships. Second, the number of candidate models in a high-dimensional data set is forbiddingly large. This paper describes recent research addressing these two barriers. To address the first barrier, the primary author and colleagues developed a specialized Bayesian network model for representing the relationship between features and disease, and a Bayesian network scoring criterion tailored to this model. This research is summarized in Section 2. To address the second barrier the primary author and colleagues developed an enhancement of Greedy Equivalent Search. This research is discussed in Section 3. Background is provided in Section 1.

    Original languageEnglish (US)
    Title of host publicationData Mining
    Subtitle of host publicationFoundations and Intelligent Paradigms: Volume 3:Medical,Health, Social, Biological and other Applications
    EditorsDawn Holmes, Lakhmi Jain
    Pages187-209
    Number of pages23
    DOIs
    StatePublished - 2012

    Publication series

    NameIntelligent Systems Reference Library
    Volume25
    ISSN (Print)1868-4394
    ISSN (Electronic)1868-4408

    Funding

    This work was supported by grants from The Commonwealth Fund (#20130084) and the National Institute on Aging-National Institutes of Health (K23AG035030-Morden). The authors have no conflicts to report.

    ASJC Scopus subject areas

    • General Computer Science
    • Information Systems and Management
    • Library and Information Sciences

    Fingerprint

    Dive into the research topics of 'Mining Epistatic Interactions from High-Dimensional Data Sets'. Together they form a unique fingerprint.

    Cite this