Sample size selection in optimization methods for machine learning

Richard H. Byrd, Gillian M. Chin, Jorge Nocedal*, Yuchen Wu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

168 Scopus citations

Abstract

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an O(1/ε) complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L1-regularized problems designed to produce sparse solutions.We propose a Newton-like method that consists of two phases: A (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

Original languageEnglish (US)
Pages (from-to)127-155
Number of pages29
JournalMathematical Programming
Volume134
Issue number1
DOIs
StatePublished - Aug 2012

ASJC Scopus subject areas

  • Software
  • Mathematics(all)

Fingerprint

Dive into the research topics of 'Sample size selection in optimization methods for machine learning'. Together they form a unique fingerprint.

Cite this