On the use of stochastic hessian information in optimization methods for machine learning

Richard H. Byrd*, Gillian M. Chin, Will Neveitt, Nocedal Jorge

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

181 Scopus citations

Abstract

This paper describes how to incorporate sampled curvature information in a Newton-CG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation approach. Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.

Original languageEnglish (US)
Pages (from-to)977-995
Number of pages19
JournalSIAM Journal on Optimization
Volume21
Issue number3
DOIs
StatePublished - 2011

Keywords

  • Machine learning
  • Stochastic optimization
  • Unconstrained optimization

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'On the use of stochastic hessian information in optimization methods for machine learning'. Together they form a unique fingerprint.

Cite this