SpiderLearner: An ensemble approach to Gaussian graphical model estimation

Katherine H. Shutta*, Laura B. Balzer, Denise M. Scholtens, Raji Balasubramanian

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Gaussian graphical models (GGMs) are a popular form of network model in which nodes represent features in multivariate normal data and edges reflect conditional dependencies between these features. GGM estimation is an active area of research. Currently available tools for GGM estimation require investigators to make several choices regarding algorithms, scoring criteria, and tuning parameters. An estimated GGM may be highly sensitive to these choices, and the accuracy of each method can vary based on structural characteristics of the network such as topology, degree distribution, and density. Because these characteristics are a priori unknown, it is not straightforward to establish universal guidelines for choosing a GGM estimation method. We address this problem by introducing SpiderLearner, an ensemble method that constructs a consensus network from multiple estimated GGMs. Given a set of candidate methods, SpiderLearner estimates the optimal convex combination of results from each method using a likelihood-based loss function. (Formula presented.) -fold cross-validation is applied in this process, reducing the risk of overfitting. In simulations, SpiderLearner performs better than or comparably to the best candidate methods according to a variety of metrics, including relative Frobenius norm and out-of-sample likelihood. We apply SpiderLearner to publicly available ovarian cancer gene expression data including 2013 participants from 13 diverse studies, demonstrating our tool's potential to identify biomarkers of complex disease. SpiderLearner is implemented as flexible, extensible, open-source code in the R package ensembleGGM at https://github.com/katehoffshutta/ensembleGGM.

Original languageEnglish (US)
Pages (from-to)2116-2133
Number of pages18
JournalStatistics in Medicine
Volume42
Issue number13
DOIs
StateAccepted/In press - 2023

Keywords

  • Gaussian graphical models
  • ensemble models
  • gene expression
  • networks
  • ovarian cancer
  • super learner

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'SpiderLearner: An ensemble approach to Gaussian graphical model estimation'. Together they form a unique fingerprint.

Cite this