On the consistency of Bayesian variable selection for high dimensional binary regression and classification

Wenxin Jiang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.

Original languageEnglish (US)
Pages (from-to)2762-2776
Number of pages15
JournalNeural Computation
Volume18
Issue number11
DOIs
StatePublished - Nov 2006

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Cognitive Neuroscience

Fingerprint Dive into the research topics of 'On the consistency of Bayesian variable selection for high dimensional binary regression and classification'. Together they form a unique fingerprint.

Cite this