A Progressive Batching L-BFGS Method for Machine Learning

Raghu Bollapragada*, Dheevatsa Mudigere, Jorge Nocedal, Hao Jun Michael Shi, Ping Tak Peter Tang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages989-1013
Number of pages25
Volume2
ISBN (Electronic)9781510867963
StatePublished - Jan 1 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

Fingerprint

Learning systems
Logistics
Deep neural networks

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Cite this

Bollapragada, R., Mudigere, D., Nocedal, J., Shi, H. J. M., & Tang, P. T. P. (2018). A Progressive Batching L-BFGS Method for Machine Learning. In J. Dy, & A. Krause (Eds.), 35th International Conference on Machine Learning, ICML 2018 (Vol. 2, pp. 989-1013). International Machine Learning Society (IMLS).
Bollapragada, Raghu ; Mudigere, Dheevatsa ; Nocedal, Jorge ; Shi, Hao Jun Michael ; Tang, Ping Tak Peter. / A Progressive Batching L-BFGS Method for Machine Learning. 35th International Conference on Machine Learning, ICML 2018. editor / Jennifer Dy ; Andreas Krause. Vol. 2 International Machine Learning Society (IMLS), 2018. pp. 989-1013
@inproceedings{9858ac2698e4489ab8169ddbb40f1f39,
title = "A Progressive Batching L-BFGS Method for Machine Learning",
abstract = "The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.",
author = "Raghu Bollapragada and Dheevatsa Mudigere and Jorge Nocedal and Shi, {Hao Jun Michael} and Tang, {Ping Tak Peter}",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2",
pages = "989--1013",
editor = "Jennifer Dy and Andreas Krause",
booktitle = "35th International Conference on Machine Learning, ICML 2018",
publisher = "International Machine Learning Society (IMLS)",

}

Bollapragada, R, Mudigere, D, Nocedal, J, Shi, HJM & Tang, PTP 2018, A Progressive Batching L-BFGS Method for Machine Learning. in J Dy & A Krause (eds), 35th International Conference on Machine Learning, ICML 2018. vol. 2, International Machine Learning Society (IMLS), pp. 989-1013, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 7/10/18.

A Progressive Batching L-BFGS Method for Machine Learning. / Bollapragada, Raghu; Mudigere, Dheevatsa; Nocedal, Jorge; Shi, Hao Jun Michael; Tang, Ping Tak Peter.

35th International Conference on Machine Learning, ICML 2018. ed. / Jennifer Dy; Andreas Krause. Vol. 2 International Machine Learning Society (IMLS), 2018. p. 989-1013.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A Progressive Batching L-BFGS Method for Machine Learning

AU - Bollapragada, Raghu

AU - Mudigere, Dheevatsa

AU - Nocedal, Jorge

AU - Shi, Hao Jun Michael

AU - Tang, Ping Tak Peter

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

AB - The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.

UR - http://www.scopus.com/inward/record.url?scp=85057225778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057225778&partnerID=8YFLogxK

M3 - Conference contribution

VL - 2

SP - 989

EP - 1013

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Dy, Jennifer

A2 - Krause, Andreas

PB - International Machine Learning Society (IMLS)

ER -

Bollapragada R, Mudigere D, Nocedal J, Shi HJM, Tang PTP. A Progressive Batching L-BFGS Method for Machine Learning. In Dy J, Krause A, editors, 35th International Conference on Machine Learning, ICML 2018. Vol. 2. International Machine Learning Society (IMLS). 2018. p. 989-1013