Coefficient tree regression for generalized linear models

Özge Sürer*, Daniel W. Apley, Edward Carl Malthouse

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Large regression data sets are now commonplace, with so many predictors that they cannot or should not all be included individually. In practice, derived predictors are relevant as meaningful features or, at the very least, as a form of regularized approximation of the true coefficients. We consider derived predictors that are the sum of some groups of individual predictors, which is equivalent to predictors within a group sharing the same coefficient. However, the groups of predictors are usually not known in advance and must be discovered from the data. In this paper we develop a coefficient tree regression algorithm for generalized linear models to discover the group structure from the data. The approach results in simple and highly interpretable models, and we demonstrated with real examples that it can provide a clear and concise interpretation of the data. Via simulation studies under different scenarios we showed that our approach performs better than existing competitors in terms of computing time and predictive accuracy.

Original languageEnglish (US)
JournalStatistical Analysis and Data Mining
DOIs
StateAccepted/In press - 2021

Keywords

  • group structure
  • homogeneous regression coefficients
  • ontology
  • supervised clustering

ASJC Scopus subject areas

  • Analysis
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Coefficient tree regression for generalized linear models'. Together they form a unique fingerprint.

Cite this