Abstract
Synchronous Stochastic Gradient Descent (SGD) with data parallelism, the most popular parallel training strategy for deep learning, suffers from expensive gradient communications. Local SGD with periodic model averaging is a promising alternative to synchronous SGD. The algorithm allows each worker to locally update its own model, and periodically averages the model parameters across all the workers. While this algorithm enjoys less frequent communications, the convergence rate is strongly affected by the number of workers. In order to scale up the local SGD training without losing accuracy, the number of workers should be sufficiently small so that the model converges reasonably fast. In this paper, we discuss how to exploit the degree of parallelism in local SGD while maintaining model accuracy. Our training strategy employs multiple groups of processes and each group trains a local model based on data parallelism. The local models are periodically averaged across all the groups. Based on this hierarchical parallelism, we design a model averaging algorithm that has a cheaper communication cost than allreduce-based approach. We also propose a practical metric for finding the maximum number of workers that does not cause a significant accuracy loss. Our experimental results demonstrate that our proposed training strategy provides a significantly improved scalability while achieving a comparable model accuracy to synchronous SGD.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020 |
Editors | Xintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 718-727 |
Number of pages | 10 |
ISBN (Electronic) | 9781728162515 |
DOIs | |
State | Published - Dec 10 2020 |
Event | 8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Atlanta, United States Duration: Dec 10 2020 → Dec 13 2020 |
Publication series
Name | Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020 |
---|
Conference
Conference | 8th IEEE International Conference on Big Data, Big Data 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Atlanta |
Period | 12/10/20 → 12/13/20 |
Funding
the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. ACKNOWLEDGMENT This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program, under the “RAPIDS Institute”. This work is also supported in part by the DOE awards DE-SC0021399, DE-SC0014330, DE-SC0019358, and NIST award 70NANB19H005. This research used resources of
Keywords
- Deep Learning
- Local SGD
- Parallel Training
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Information Systems and Management
- Safety, Risk, Reliability and Quality