Abstract
The problem of on-line learning in two-layer neural networks is studied within the framework of statistical mechanics. A fully connected committee machine with K hidden units is trained by gradient descent to perform a task defined by a teacher committee machine with M hidden units acting on randomly drawn inputs. The approach, based on a direct averaging over the activation of the hidden units, results in a set of first-order differential equations that describes the dynamical evolution of the overlaps among the various hidden units and allows for a computation of the generalization error. The equations of motion are obtained analytically for general K and M and provide a powerful tool used here to study a variety of realizable, over-realizable, and unrealizable learning scenarios and to analyze the role of the learning rate in controlling the evolution and convergence of the learning process.
Original language | English (US) |
---|---|
Pages (from-to) | 4225-4243 |
Number of pages | 19 |
Journal | Physical Review E |
Volume | 52 |
Issue number | 4 |
DOIs | |
State | Published - Jan 1 1995 |
ASJC Scopus subject areas
- Statistical and Nonlinear Physics
- Mathematical Physics
- Condensed Matter Physics
- Physics and Astronomy(all)