Abstract
A general statistical description of the problem of learning from examples is presented. Our focus is on learning in layered networks, which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, we arrive at the Cibbs distribution on the ensemble of networks with a fixed architecture. Using this ensemble, the probability of correct prediction of a novel example can be expressed, serving as a measure of the network’s generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network’s performance. This quantity is directly derived from the ensemble statistical properties and is identical to the stochastic complexity of the training data. Our approach is a link between the information-theoretic model-order-estimation techniques, particularly minimum description length, and the statistical mechanics of neural networks. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves.
Original language | English (US) |
---|---|
Pages (from-to) | 1568-1574 |
Number of pages | 7 |
Journal | Proceedings of the IEEE |
Volume | 78 |
Issue number | 10 |
DOIs | |
State | Published - Oct 1990 |
Funding
Naftali 2. Tishby was born in Jerusalem, Israel, in December 1952. He received the B.Sc. degree (cum laude) in physics and mathematics from the Hebrew University of Jerusalem in 1974, M.Sc. degree (cum laude) in physics from Tel-Aviv University in 1980, and the Ph.D. in theoretical physics from Hebrew University in 1985. From 1974 to 1981 he was with the Israel Defense Forces (IDF), where he established and headed a research group in signal and speech processing. During 1984-1985 he served as aVice President of Research in Sesame Systems Ltd. developing speech and speaker recognition systems. In 1985-1986 he was a postdoctoral fellow at the Massachusetts Institute of Technology, working on chaotic Hamiltonian dynamics. Since 1987 he has been a Member of Technical Staff (information principles laboratory) at AT&T Bell Laboratories, Murray Hill, NJ. His current research subjects include nonlinear dynamics and its applications to speech processing, stochastic processes, learning theory, and statistical mechanics of neural networks. Dr. Tishby received the Eliyahu Golomb Israel Security Award in 1980 and the Chaim Weizmann fellowship in physics in 1985.
ASJC Scopus subject areas
- General Computer Science
- Electrical and Electronic Engineering