A Statistical Approach to Learning and Generalization in Layered Neural Networks

Esther Levin, Naftali Tishby, Sara A. Solla

Research output: Contribution to journalArticle

128 Scopus citations

Abstract

A general statistical description of the problem of learning from examples is presented. Our focus is on learning in layered networks, which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, we arrive at the Cibbs distribution on the ensemble of networks with a fixed architecture. Using this ensemble, the probability of correct prediction of a novel example can be expressed, serving as a measure of the network’s generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network’s performance. This quantity is directly derived from the ensemble statistical properties and is identical to the stochastic complexity of the training data. Our approach is a link between the information-theoretic model-order-estimation techniques, particularly minimum description length, and the statistical mechanics of neural networks. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves.

Original languageEnglish (US)
Pages (from-to)1568-1574
Number of pages7
JournalProceedings of the IEEE
Volume78
Issue number10
DOIs
StatePublished - Oct 1990

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'A Statistical Approach to Learning and Generalization in Layered Neural Networks'. Together they form a unique fingerprint.

  • Cite this