Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Training Convolutional Neural Network (CNN) is a computationally intensive task whose parallelization has become critical in order to complete the training in an acceptable time. However, there are two obstacles to developing a scalable parallel CNN in a distributed-memory computing environment. One is the high degree of data dependency exhibited in the model parameters across every two adjacent minibatches and the other is the large amount of data to be transferred across the communication channel. In this paper, we present a parallelization strategy that maximizes the overlap of inter-process communication with the computation. The overlapping is achieved by using a thread per compute node to initiate communication after the gradients are available. The output data of backpropagation stage is generated at each model layer, and the communication for the data can run concurrently with the computation of other layers. To study the effectiveness of the overlapping and its impact on the scalability, we evaluated various model architectures and hyperparameter settings. When training VGG-A model using ImageNet data sets, we achieve speedups of 62.97× and 77.97× on 128 compute nodes using mini-batch sizes of 256 and 512, respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages183-192
Number of pages10
ISBN (Electronic)9781538622933
DOIs
StatePublished - Feb 7 2018
Event24th IEEE International Conference on High Performance Computing, HiPC 2017 - Jaipur, India
Duration: Dec 18 2017Dec 21 2017

Publication series

NameProceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017
Volume2017-December

Other

Other24th IEEE International Conference on High Performance Computing, HiPC 2017
CountryIndia
CityJaipur
Period12/18/1712/21/17

Fingerprint

Overlapping
Neural Networks
Neural networks
Communication
Parallelization
Data Dependency
Hyperparameters
Back Propagation
Distributed Memory
Communication Channels
Vertex of a graph
Backpropagation
Model
Thread
Batch
Scalability
Overlap
Adjacent
Maximise
Gradient

Keywords

  • Communication
  • Convolutional Neural Network
  • Deep Learning
  • Overlapping
  • Parallelization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Modeling and Simulation

Cite this

Lee, S., Jha, D., Agrawal, A., Choudhary, A. N., & Liao, W-K. (2018). Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication. In Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017 (pp. 183-192). (Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017; Vol. 2017-December). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HiPC.2017.00030
Lee, Sunwoo ; Jha, Dipendra ; Agrawal, Ankit ; Choudhary, Alok Nidhi ; Liao, Wei-Keng. / Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication. Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 183-192 (Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017).
@inproceedings{34c0eb1a5c9249569407968fd715fe89,
title = "Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication",
abstract = "Training Convolutional Neural Network (CNN) is a computationally intensive task whose parallelization has become critical in order to complete the training in an acceptable time. However, there are two obstacles to developing a scalable parallel CNN in a distributed-memory computing environment. One is the high degree of data dependency exhibited in the model parameters across every two adjacent minibatches and the other is the large amount of data to be transferred across the communication channel. In this paper, we present a parallelization strategy that maximizes the overlap of inter-process communication with the computation. The overlapping is achieved by using a thread per compute node to initiate communication after the gradients are available. The output data of backpropagation stage is generated at each model layer, and the communication for the data can run concurrently with the computation of other layers. To study the effectiveness of the overlapping and its impact on the scalability, we evaluated various model architectures and hyperparameter settings. When training VGG-A model using ImageNet data sets, we achieve speedups of 62.97× and 77.97× on 128 compute nodes using mini-batch sizes of 256 and 512, respectively.",
keywords = "Communication, Convolutional Neural Network, Deep Learning, Overlapping, Parallelization",
author = "Sunwoo Lee and Dipendra Jha and Ankit Agrawal and Choudhary, {Alok Nidhi} and Wei-Keng Liao",
year = "2018",
month = "2",
day = "7",
doi = "10.1109/HiPC.2017.00030",
language = "English (US)",
series = "Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "183--192",
booktitle = "Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017",
address = "United States",

}

Lee, S, Jha, D, Agrawal, A, Choudhary, AN & Liao, W-K 2018, Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication. in Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017. Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017, vol. 2017-December, Institute of Electrical and Electronics Engineers Inc., pp. 183-192, 24th IEEE International Conference on High Performance Computing, HiPC 2017, Jaipur, India, 12/18/17. https://doi.org/10.1109/HiPC.2017.00030

Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication. / Lee, Sunwoo; Jha, Dipendra; Agrawal, Ankit; Choudhary, Alok Nidhi; Liao, Wei-Keng.

Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017. Institute of Electrical and Electronics Engineers Inc., 2018. p. 183-192 (Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017; Vol. 2017-December).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication

AU - Lee, Sunwoo

AU - Jha, Dipendra

AU - Agrawal, Ankit

AU - Choudhary, Alok Nidhi

AU - Liao, Wei-Keng

PY - 2018/2/7

Y1 - 2018/2/7

N2 - Training Convolutional Neural Network (CNN) is a computationally intensive task whose parallelization has become critical in order to complete the training in an acceptable time. However, there are two obstacles to developing a scalable parallel CNN in a distributed-memory computing environment. One is the high degree of data dependency exhibited in the model parameters across every two adjacent minibatches and the other is the large amount of data to be transferred across the communication channel. In this paper, we present a parallelization strategy that maximizes the overlap of inter-process communication with the computation. The overlapping is achieved by using a thread per compute node to initiate communication after the gradients are available. The output data of backpropagation stage is generated at each model layer, and the communication for the data can run concurrently with the computation of other layers. To study the effectiveness of the overlapping and its impact on the scalability, we evaluated various model architectures and hyperparameter settings. When training VGG-A model using ImageNet data sets, we achieve speedups of 62.97× and 77.97× on 128 compute nodes using mini-batch sizes of 256 and 512, respectively.

AB - Training Convolutional Neural Network (CNN) is a computationally intensive task whose parallelization has become critical in order to complete the training in an acceptable time. However, there are two obstacles to developing a scalable parallel CNN in a distributed-memory computing environment. One is the high degree of data dependency exhibited in the model parameters across every two adjacent minibatches and the other is the large amount of data to be transferred across the communication channel. In this paper, we present a parallelization strategy that maximizes the overlap of inter-process communication with the computation. The overlapping is achieved by using a thread per compute node to initiate communication after the gradients are available. The output data of backpropagation stage is generated at each model layer, and the communication for the data can run concurrently with the computation of other layers. To study the effectiveness of the overlapping and its impact on the scalability, we evaluated various model architectures and hyperparameter settings. When training VGG-A model using ImageNet data sets, we achieve speedups of 62.97× and 77.97× on 128 compute nodes using mini-batch sizes of 256 and 512, respectively.

KW - Communication

KW - Convolutional Neural Network

KW - Deep Learning

KW - Overlapping

KW - Parallelization

UR - http://www.scopus.com/inward/record.url?scp=85050337799&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050337799&partnerID=8YFLogxK

U2 - 10.1109/HiPC.2017.00030

DO - 10.1109/HiPC.2017.00030

M3 - Conference contribution

T3 - Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017

SP - 183

EP - 192

BT - Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lee S, Jha D, Agrawal A, Choudhary AN, Liao W-K. Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication. In Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017. Institute of Electrical and Electronics Engineers Inc. 2018. p. 183-192. (Proceedings - 24th IEEE International Conference on High Performance Computing, HiPC 2017). https://doi.org/10.1109/HiPC.2017.00030