ScanMap: Supervised Confounding Aware Non-negative Matrix Factorization for Polygenic Risk Modeling

Research output: Contribution to journalConference articlepeer-review

5 Scopus citations

Abstract

Molecular mechanisms are important to inform targeted intervention and are often encoded in gene sets or pathways. Existing machine learning approaches often face challenges in simultaneously reducing the high dimensionality and learning effective features that are discriminative in predicting the disease types with the usual presence of confounding variables. We aim to improve accuracy and interpretability of prediction models by introducing Supervised Confounding Aware Non-negative Matrix Factorization for Polygenic Risk Modeling (ScanMap) for genetic studies. ScanMap selects informative groups of genes that embody multiple interacting molecular functions by using a supervised model that integrates both groups of genes and confounding variables in predicting disease type and status. The learned groups of genes reflect interacting molecular mechanisms, which are suitable features for polygenic risk modeling. These learned features are then used in training a softmax classifier for disease type and status prediction. We evaluated ScanMap against multiple state-of-the-art unsupervised and supervised matrix factorization models using large scale NGS datasets. ScanMap outperformed all comparison models significantly (p < 0.05). Feature analysis was performed to illuminate the insights and benefits of gene groups learned by ScanMap in disease risk prediction.

Original languageEnglish (US)
Pages (from-to)27-45
Number of pages19
JournalProceedings of Machine Learning Research
Volume126
StatePublished - 2020
Event5th Machine Learning for Healthcare Conference, MLHC 2020 - Virtual, Online
Duration: Aug 7 2020Aug 8 2020

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'ScanMap: Supervised Confounding Aware Non-negative Matrix Factorization for Polygenic Risk Modeling'. Together they form a unique fingerprint.

Cite this