Performance portability has become one of the most critical goals for HPC applications and systems as we move forward to exascale era and beyond. In communication subsystems such as MPI, there is a large performance tuning space that heavily relies on the internal algorithms and the value of parameters. The optimal configuration of such algorithm and parameters, however, is platform-specific, scale-specific, and time-specific. Traditional tuning approaches require either offline exhaustive search or runtime heuristics. Developers and users have to make careful tradeoffs between the tuning overhead and the potential performance gain. In large-scale systems, moreover, such approaches may even be impractical due to the significant tuning overhead. Machine learning techniques have shown potential for performance autotuning in recent years especially when integrated with the compiler systems for tuning of loops, recursions, and other recurrences, language runtime systems, and operating systems. This project aims to investigate the efficient use of machine learning techniques in performance analysis and autotuning for various aspects of communication runtime systems. The PIs, Professors Peter Dinda and Nikos Hardavellas, and their co-advised Ph.D. student, Michael Wilkins, will conduct research on machine learning-based performance autotuning for distributed memory communication systems in high performance computing (HPC). The primary use of funds under this contract will be to support Michael Wilkins, who will contribute the bulk of the time and will also serve as the bridge between Northwestern and Argonne with respect to this project. It is understood that the primary goal of this contract is to enable Michael Wilkins to execute this project as guided Ph.D. research. A secondary goal is to determine whether the project over a longer term might constitute his thesis. The work in FY21 will focus on the development of a machine learning-based tuning framework for the essential MPI collective communication primitives, and an evaluation of the framework with applications. As collectives are the most performance critical pattern for many DOE applications, the outcome of this contract will directly benefit those applications when run on existing systems as well as provide the first steps toward performance-portable collective communication support for adapting to the upcoming Argonne Aurora supercomputer.
|Effective start/end date||3/1/21 → 9/30/23|
- UChicago Argonne, LLC, Argonne National Laboratory (8J-30009-0027B (Revised))
- Department of Energy (8J-30009-0027B (Revised))
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.