A Statistical Approach to Learning and Generalization in Layered Neural Networks

Esther Levin, Naftali Tishby, Sara A. Solla

Research output: Contribution to journalArticlepeer-review

133 Scopus citations


A general statistical description of the problem of learning from examples is presented. Our focus is on learning in layered networks, which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, we arrive at the Cibbs distribution on the ensemble of networks with a fixed architecture. Using this ensemble, the probability of correct prediction of a novel example can be expressed, serving as a measure of the network’s generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network’s performance. This quantity is directly derived from the ensemble statistical properties and is identical to the stochastic complexity of the training data. Our approach is a link between the information-theoretic model-order-estimation techniques, particularly minimum description length, and the statistical mechanics of neural networks. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves.

Original languageEnglish (US)
Pages (from-to)1568-1574
Number of pages7
JournalProceedings of the IEEE
Issue number10
StatePublished - Oct 1990

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering


Dive into the research topics of 'A Statistical Approach to Learning and Generalization in Layered Neural Networks'. Together they form a unique fingerprint.

Cite this