TY - GEN
T1 - Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications
AU - Wang, Kewei
AU - Lee, Sunwoo
AU - Balewski, Jan
AU - Sim, Alex
AU - Nugent, Peter
AU - Agrawal, Ankit
AU - Choudhary, Alok
AU - Wu, Kesheng
AU - Liao, Wei Keng
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported in part by U.S. Department of Energy (DoE) under award numbers DE-SC0021399, DE- SC0019358, DoE Contract No. DE-AC02-05CH11231, the National Institute of Standards and Technology award number 70NANB19H005, and the Exas-cale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 using NERSC awards ASCR-ERCAP0021094 and ASCR-ERCAP0021411. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Neural networks are powerful solutions to many scientific applications; however, they usually require long model training time due to large training data sets or large model size. Research has been focused on developing numerical optimization algorithms and parallel processing to reduce the training time. In this work, we propose a multi-resolution strategy that can reduce the training time by training the model with the reduced-resolution data samples at the beginning and later switching to the original resolution data samples. This strategy is motivated by the observation that coarser versions of many applications can be solved faster than their denser counterparts, and the solution to a coarser problem could be used to initialize the solution to the denser problem. When applying the idea to neural network training, coarse data can have a similar effect on the learning curves at the early stage as the dense data but requires less time. Once the curves no longer improve significantly, our strategy switches to using the data in original resolution. The key in this process is the ability to generate multiple resolutions of a problem automatically, which could usually be done with scientific applications with spatial and temporal continuity. We use two real-world scientific applications, CosmoFlow and DeepCAM, to evaluate the proposed mixed-resolution training strategy. Our experiment results demonstrate that the proposed training strategy effectively reduces the end-to-end training time while achieving a comparable accuracy to that of the training only with the original data. While maintaining the same model accuracy, our multi-resolution training strategy reduces the end-to-end training time up to 30% and 23% for CosmoFlow and DeepCAM, respectively.
AB - Neural networks are powerful solutions to many scientific applications; however, they usually require long model training time due to large training data sets or large model size. Research has been focused on developing numerical optimization algorithms and parallel processing to reduce the training time. In this work, we propose a multi-resolution strategy that can reduce the training time by training the model with the reduced-resolution data samples at the beginning and later switching to the original resolution data samples. This strategy is motivated by the observation that coarser versions of many applications can be solved faster than their denser counterparts, and the solution to a coarser problem could be used to initialize the solution to the denser problem. When applying the idea to neural network training, coarse data can have a similar effect on the learning curves at the early stage as the dense data but requires less time. Once the curves no longer improve significantly, our strategy switches to using the data in original resolution. The key in this process is the ability to generate multiple resolutions of a problem automatically, which could usually be done with scientific applications with spatial and temporal continuity. We use two real-world scientific applications, CosmoFlow and DeepCAM, to evaluate the proposed mixed-resolution training strategy. Our experiment results demonstrate that the proposed training strategy effectively reduces the end-to-end training time while achieving a comparable accuracy to that of the training only with the original data. While maintaining the same model accuracy, our multi-resolution training strategy reduces the end-to-end training time up to 30% and 23% for CosmoFlow and DeepCAM, respectively.
KW - Deep Learning
KW - Multi-resolution Data
KW - Transfer Learning
UR - http://www.scopus.com/inward/record.url?scp=85135762236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135762236&partnerID=8YFLogxK
U2 - 10.1109/CCGrid54584.2022.00050
DO - 10.1109/CCGrid54584.2022.00050
M3 - Conference contribution
AN - SCOPUS:85135762236
T3 - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
SP - 404
EP - 413
BT - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
A2 - Fazio, Maria
A2 - Panda, Dhabaleswar K.
A2 - Prodan, Radu
A2 - Cardellini, Valeria
A2 - Kantarci, Burak
A2 - Rana, Omer
A2 - Villari, Massimo
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
Y2 - 16 May 2022 through 19 May 2022
ER -