Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time

Sunwoo Lee, Qiao Kang, Sandeep Madireddy, Prasanna Balaprakash, Ankit Agrawal, Alok Choudhary, Richard Archibald, Wei Keng Liao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring efficient parallelization to shorten the execution time. Considering the ever-increasing size of available training data, the parallelization of CNN training becomes more important. Data-parallelism, a popular parallelization strategy that distributes the input data among compute processes, requires the mini-batch size to be sufficiently large to achieve a high degree of parallelism. However, training with large batch size is known to produce a low convergence accuracy. In image restoration problems, for example, the batch size is typically tuned to a small value between 16 ∼ 64, making it challenging to scale up the training. In this paper, we propose a parallel CNN training strategy that gradually increases the mini-batch size and learning rate at run-time. While improving the scalability, this strategy also maintains the accuracy close to that of the training with a fixed small batch size. We evaluate the performance of the proposed parallel CNN training algorithm with image regression and classification applications using various models and datasets.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages830-839
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
CountryUnited States
CityLos Angeles
Period12/9/1912/12/19

Keywords

  • Adaptive Batch Size
  • Convolutional Neural Network
  • Deep Learning
  • Parallelization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time'. Together they form a unique fingerprint.

Cite this