Abstract
MPI collective communication is an omnipresent communication model for high-performance computing (HPC) systems. The performance of a collective operation depends strongly on the algorithm used to implement it. MPI libraries use inaccurate heuristics to select these algorithms, causing applications to suffer unnecessary slowdowns. Machine learning (ML)-based autotuners are a promising alternative. ML autotuners can intelligently select algorithms for individual jobs, resulting in near-optimal performance. However, these approaches currently spend more time training than they save by accelerating applications, rendering them impractical. We make the case that ML-based collective algorithm selection autotuners can be made practical and accelerate production applications on large-scale supercomputers. We identify multiple impracticalities in the existing work, such as inefficient training point selection and ignoring non-power-of-two feature values. We address these issues through variance-based point selection and model testing alongside topology-aware benchmark paral-lelization. Our approach minimizes training time by eliminating unnecessary training points and maximizing machine utilization. We incorporate our improvements in a prototype active learning system, ACCLAiM (Advancing Collective Communication (L) Autotuning using Machine Learning). We show that each of ACCLAiM's advancements significantly reduces training time compared with the best existing machine learning approach. Then we apply ACCLAiM on a leadership-class supercomputer and demonstrate the conditions where ACCLAiM can accelerate HPC applications, proving the advantage of ML autotuners in a production setting for the first time.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2022 IEEE International Conference on Cluster Computing, CLUSTER 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 161-171 |
Number of pages | 11 |
ISBN (Electronic) | 9781665498562 |
DOIs | |
State | Published - 2022 |
Event | 2022 IEEE International Conference on Cluster Computing, CLUSTER 2022 - Heidelberg, Germany Duration: Sep 6 2022 → Sep 9 2022 |
Publication series
Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
---|---|
Volume | 2022-September |
ISSN (Print) | 1552-5244 |
Conference
Conference | 2022 IEEE International Conference on Cluster Computing, CLUSTER 2022 |
---|---|
Country/Territory | Germany |
City | Heidelberg |
Period | 9/6/22 → 9/9/22 |
Funding
IX. ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, by the U.S. Department of Energy, Office of Science, under Contract DE-AC02-06CH11357, and by the U.S. National Science Foundation via award CCF-2119069. This research used Bebop, a high-performance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory, and the resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
Keywords
- MPI
- autotuning machine learning
- collective communication
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Signal Processing