Extracting relationships by multi-domain matching

Yitong Li, Michael Andrew Murias, Samantha Major, Geraldine Dawson, David E. Carlson

Research output: Contribution to journalConference article

Abstract

In many biological and medical contexts, we construct a large labeled corpus by aggregating many sources to use in target prediction tasks. Unfortunately, many of the sources may be irrelevant to our target task, so ignoring the structure of the dataset is detrimental. This work proposes a novel approach, the Multiple Domain Matching Network (MDMN), to exploit this structure. MDMN embeds all data into a shared feature space while learning which domains share strong statistical relationships. These relationships are often insightful in their own right, and they allow domains to share strength without interference from irrelevant data. This methodology builds on existing distribution-matching approaches by assuming that source domains are varied and outcomes multi-factorial. Therefore, each domain should only match a relevant subset. Theoretical analysis shows that the proposed approach can have a tighter generalization bound than existing multiple-domain adaptation approaches. Empirically, we show that the proposed methodology handles higher numbers of source domains (up to 21 empirically), and provides state-of-the-art performance on image, text, and multi-channel time series classification, including clinical outcome data in an open label trial evaluating a novel treatment for Autism Spectrum Disorder.

Original languageEnglish (US)
Pages (from-to)6798-6809
Number of pages12
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - Jan 1 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

Fingerprint

Labels
Time series

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Li, Y., Murias, M. A., Major, S., Dawson, G., & Carlson, D. E. (2018). Extracting relationships by multi-domain matching. Advances in Neural Information Processing Systems, 2018-December, 6798-6809.
Li, Yitong ; Murias, Michael Andrew ; Major, Samantha ; Dawson, Geraldine ; Carlson, David E. / Extracting relationships by multi-domain matching. In: Advances in Neural Information Processing Systems. 2018 ; Vol. 2018-December. pp. 6798-6809.
@article{6d0b414308764dc086253f4d87efb527,
title = "Extracting relationships by multi-domain matching",
abstract = "In many biological and medical contexts, we construct a large labeled corpus by aggregating many sources to use in target prediction tasks. Unfortunately, many of the sources may be irrelevant to our target task, so ignoring the structure of the dataset is detrimental. This work proposes a novel approach, the Multiple Domain Matching Network (MDMN), to exploit this structure. MDMN embeds all data into a shared feature space while learning which domains share strong statistical relationships. These relationships are often insightful in their own right, and they allow domains to share strength without interference from irrelevant data. This methodology builds on existing distribution-matching approaches by assuming that source domains are varied and outcomes multi-factorial. Therefore, each domain should only match a relevant subset. Theoretical analysis shows that the proposed approach can have a tighter generalization bound than existing multiple-domain adaptation approaches. Empirically, we show that the proposed methodology handles higher numbers of source domains (up to 21 empirically), and provides state-of-the-art performance on image, text, and multi-channel time series classification, including clinical outcome data in an open label trial evaluating a novel treatment for Autism Spectrum Disorder.",
author = "Yitong Li and Murias, {Michael Andrew} and Samantha Major and Geraldine Dawson and Carlson, {David E.}",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018-December",
pages = "6798--6809",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

Li, Y, Murias, MA, Major, S, Dawson, G & Carlson, DE 2018, 'Extracting relationships by multi-domain matching', Advances in Neural Information Processing Systems, vol. 2018-December, pp. 6798-6809.

Extracting relationships by multi-domain matching. / Li, Yitong; Murias, Michael Andrew; Major, Samantha; Dawson, Geraldine; Carlson, David E.

In: Advances in Neural Information Processing Systems, Vol. 2018-December, 01.01.2018, p. 6798-6809.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Extracting relationships by multi-domain matching

AU - Li, Yitong

AU - Murias, Michael Andrew

AU - Major, Samantha

AU - Dawson, Geraldine

AU - Carlson, David E.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - In many biological and medical contexts, we construct a large labeled corpus by aggregating many sources to use in target prediction tasks. Unfortunately, many of the sources may be irrelevant to our target task, so ignoring the structure of the dataset is detrimental. This work proposes a novel approach, the Multiple Domain Matching Network (MDMN), to exploit this structure. MDMN embeds all data into a shared feature space while learning which domains share strong statistical relationships. These relationships are often insightful in their own right, and they allow domains to share strength without interference from irrelevant data. This methodology builds on existing distribution-matching approaches by assuming that source domains are varied and outcomes multi-factorial. Therefore, each domain should only match a relevant subset. Theoretical analysis shows that the proposed approach can have a tighter generalization bound than existing multiple-domain adaptation approaches. Empirically, we show that the proposed methodology handles higher numbers of source domains (up to 21 empirically), and provides state-of-the-art performance on image, text, and multi-channel time series classification, including clinical outcome data in an open label trial evaluating a novel treatment for Autism Spectrum Disorder.

AB - In many biological and medical contexts, we construct a large labeled corpus by aggregating many sources to use in target prediction tasks. Unfortunately, many of the sources may be irrelevant to our target task, so ignoring the structure of the dataset is detrimental. This work proposes a novel approach, the Multiple Domain Matching Network (MDMN), to exploit this structure. MDMN embeds all data into a shared feature space while learning which domains share strong statistical relationships. These relationships are often insightful in their own right, and they allow domains to share strength without interference from irrelevant data. This methodology builds on existing distribution-matching approaches by assuming that source domains are varied and outcomes multi-factorial. Therefore, each domain should only match a relevant subset. Theoretical analysis shows that the proposed approach can have a tighter generalization bound than existing multiple-domain adaptation approaches. Empirically, we show that the proposed methodology handles higher numbers of source domains (up to 21 empirically), and provides state-of-the-art performance on image, text, and multi-channel time series classification, including clinical outcome data in an open label trial evaluating a novel treatment for Autism Spectrum Disorder.

UR - http://www.scopus.com/inward/record.url?scp=85064835942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064835942&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85064835942

VL - 2018-December

SP - 6798

EP - 6809

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -

Li Y, Murias MA, Major S, Dawson G, Carlson DE. Extracting relationships by multi-domain matching. Advances in Neural Information Processing Systems. 2018 Jan 1;2018-December:6798-6809.