Abstract
Genetic epidemiologists strive to determine the genetic profile of diseases. Two or more genes can interact to have a causal effect on disease even when little or no such effect can be observed statistically for one or even both of the genes individually. This is in contrast to Mendelian diseases like cystic fibrosis, which are associated with variation at a single genetic locus. This gene-gene interaction is called epistasis. To uncover this dark matter of genetic risk it would be pivotal to be able to discover epistatic relationships from data. The recent availability of high-dimensional data sets affords us unprecedented opportunity to make headway in accomplishing this. However, there are two central barriers to successfully identifying genetic interactions using such data sets. First, it is difficult to detect epistatic interactions statistically using parametric statistical methods such as logistic regression due to the sparseness of the data and the non-linearity of the relationships. Second, the number of candidate models in a high-dimensional data set is forbiddingly large. This paper describes recent research addressing these two barriers. To address the first barrier, the primary author and colleagues developed a specialized Bayesian network model for representing the relationship between features and disease, and a Bayesian network scoring criterion tailored to this model. This research is summarized in Section 2. To address the second barrier the primary author and colleagues developed an enhancement of Greedy Equivalent Search. This research is discussed in Section 3. Background is provided in Section 1.
Original language | English (US) |
---|---|
Title of host publication | Data Mining |
Subtitle of host publication | Foundations and Intelligent Paradigms: Volume 3:Medical,Health, Social, Biological and other Applications |
Editors | Dawn Holmes, Lakhmi Jain |
Pages | 187-209 |
Number of pages | 23 |
DOIs | |
State | Published - 2012 |
Publication series
Name | Intelligent Systems Reference Library |
---|---|
Volume | 25 |
ISSN (Print) | 1868-4394 |
ISSN (Electronic) | 1868-4408 |
Funding
This work was supported by grants from The Commonwealth Fund (#20130084) and the National Institute on Aging-National Institutes of Health (K23AG035030-Morden). The authors have no conflicts to report.
ASJC Scopus subject areas
- General Computer Science
- Information Systems and Management
- Library and Information Sciences