Defining and discovering interactive causes

Xia Jiang*, Richard Neapolitan

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingChapter

    Abstract

    The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. These techniques assume the composition property, which entails that they cannot in general learn interactive causes with little marginal effects. However, such interactions are fairly commonplace. One notable example is genetic epistasis, which is the interaction of two or more genetic loci to affect phenotype. Often the genes exhibit little marginal effects. Another important example is the interaction of a treatment with patient features to affect outcomes. Even though efforts have recently been made towards developing new algorithms that discover such interactions from data, to our knowledge no definition of a discrete causal interaction has been forwarded. Using information theory, we develop a fuzzy definition of a discrete causal action, called Interaction Strength (IS). The IS is bounded above by 1 and equals 1 if the causes in the interaction exhibit no marginal effects. Using the IS and BN scoring, we develop an exhaustive search algorithm, Exhaustive-IGain, which learns interactions from low-dimension datasets, and a heuristic search algorithm, called MBS-IGain, which learns interactions from high-dimensional datasets. Using simulated high-dimensional datasets, based on models of genetic epistasis, we compare MBS-IGain to 7 algorithms that learn genetic epistasis from high-dimensional datasets, and show that MBS-IGain’s discovery performance is notably better than the other methods. We apply MBS-IGain to a real LOAD dataset, and obtain results substantiating previous research and new results. Using low-dimensional simulated datasets, we show Exhaustive-IGain can learn 4-cause interactions with no marginal effects. We apply Exhaustive-Gain to a real clinical breast cancer datasets, and learn interactions that agree with the judgements of a breast cancer oncologist. Our algorithms are only directly applicable to problems where we have a specified target and its candidate causes. However, our algorithms could be used for general causal learning by being a front end to a standard causal learning algorithm.

    Original languageEnglish (US)
    Title of host publicationIntelligent Systems Reference Library
    PublisherSpringer Science and Business Media Deutschland GmbH
    Pages53-78
    Number of pages26
    DOIs
    StatePublished - 2018

    Publication series

    NameIntelligent Systems Reference Library
    Volume137
    ISSN (Print)1868-4394
    ISSN (Electronic)1868-4408

    Keywords

    • Bayesian network
    • Causal learning
    • Entropy
    • Epistasis
    • GWAS
    • Information gain
    • Interaction
    • SNP

    ASJC Scopus subject areas

    • Computer Science(all)
    • Information Systems and Management
    • Library and Information Sciences

    Fingerprint Dive into the research topics of 'Defining and discovering interactive causes'. Together they form a unique fingerprint.

    Cite this