Defining and discovering interactive causes

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

LanguageEnglish (US)
Title of host publicationIntelligent Systems Reference Library
PublisherSpringer Science and Business Media Deutschland GmbH
Pages53-78
Number of pages26
Volume137
DOIs
StatePublished - Jan 1 2018

Publication series

NameIntelligent Systems Reference Library
Volume137
ISSN (Print)1868-4394
ISSN (Electronic)1868-4408

Fingerprint

cause
interaction
Information theory
Learning algorithms
Interaction
Genes
cancer
learning
information theory
Chemical analysis
heuristics
candidacy
Marginal effects
performance

Keywords

  • Bayesian network
  • Causal learning
  • Entropy
  • Epistasis
  • GWAS
  • Information gain
  • Interaction
  • SNP

ASJC Scopus subject areas

  • Computer Science(all)
  • Information Systems and Management
  • Library and Information Sciences

Cite this

Jiang, X., & Neapolitan, R. E. (2018). Defining and discovering interactive causes. In Intelligent Systems Reference Library (Vol. 137, pp. 53-78). (Intelligent Systems Reference Library; Vol. 137). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-67513-8_4
Jiang, Xia ; Neapolitan, Richard E. / Defining and discovering interactive causes. Intelligent Systems Reference Library. Vol. 137 Springer Science and Business Media Deutschland GmbH, 2018. pp. 53-78 (Intelligent Systems Reference Library).
@inbook{fe2177bc23994b8f92a47478747e627c,
title = "Defining and discovering interactive causes",
abstract = "The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.",
keywords = "Bayesian network, Causal learning, Entropy, Epistasis, GWAS, Information gain, Interaction, SNP",
author = "Xia Jiang and Neapolitan, {Richard E}",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-67513-8_4",
language = "English (US)",
volume = "137",
series = "Intelligent Systems Reference Library",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "53--78",
booktitle = "Intelligent Systems Reference Library",
address = "Germany",

}

Jiang, X & Neapolitan, RE 2018, Defining and discovering interactive causes. in Intelligent Systems Reference Library. vol. 137, Intelligent Systems Reference Library, vol. 137, Springer Science and Business Media Deutschland GmbH, pp. 53-78. https://doi.org/10.1007/978-3-319-67513-8_4

Defining and discovering interactive causes. / Jiang, Xia; Neapolitan, Richard E.

Intelligent Systems Reference Library. Vol. 137 Springer Science and Business Media Deutschland GmbH, 2018. p. 53-78 (Intelligent Systems Reference Library; Vol. 137).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Defining and discovering interactive causes

AU - Jiang, Xia

AU - Neapolitan, Richard E

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

AB - The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

KW - Bayesian network

KW - Causal learning

KW - Entropy

KW - Epistasis

KW - GWAS

KW - Information gain

KW - Interaction

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=85032037395&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032037395&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-67513-8_4

DO - 10.1007/978-3-319-67513-8_4

M3 - Chapter

VL - 137

T3 - Intelligent Systems Reference Library

SP - 53

EP - 78

BT - Intelligent Systems Reference Library

PB - Springer Science and Business Media Deutschland GmbH

ER -

Jiang X, Neapolitan RE. Defining and discovering interactive causes. In Intelligent Systems Reference Library. Vol. 137. Springer Science and Business Media Deutschland GmbH. 2018. p. 53-78. (Intelligent Systems Reference Library). https://doi.org/10.1007/978-3-319-67513-8_4