TY - GEN
T1 - Compiling Loop-Based Nested Parallelism for Irregular Workloads
AU - Su, Yian
AU - Rainey, Mike
AU - Wanninger, Nick
AU - Dhiantravan, Nadharm
AU - Liang, Jasper
AU - Acar, Umut A.
AU - Dinda, Peter
AU - Campanoni, Simone
N1 - Publisher Copyright:
© 2024 Association for Computing Machinery. All rights reserved.
PY - 2024/4/27
Y1 - 2024/4/27
N2 - Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however: on modern multicore systems, realizing logical parallelism results in overheads due to the creation and management of parallel tasks, which can wipe out the benefits of parallelism. Today, we expect application programmers to cope with it by manually tuning and optimizing their code. Such tuning requires programmers to reason about architectural factors hidden behind layers of software abstractions, such as task scheduling and load balancing. Managing these factors is particularly challenging when workloads are irregular because their performance is input-sensitive. This paper presents HBC, the first compiler that translates C/C++ programs with high-level, fork-join constructs (e.g., OpenMP) to binaries capable of automatically controlling the cost of parallelism and dealing with irregular, input-sensitive workloads. The basis of our approach is Heartbeat Scheduling, a recent proposal for automatic granularity control, which is backed by formal guarantees on performance. HBC binaries outperform OpenMP binaries for workloads for which even entirely manual solutions struggle to find the right balance between parallelism and its costs.
AB - Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however: on modern multicore systems, realizing logical parallelism results in overheads due to the creation and management of parallel tasks, which can wipe out the benefits of parallelism. Today, we expect application programmers to cope with it by manually tuning and optimizing their code. Such tuning requires programmers to reason about architectural factors hidden behind layers of software abstractions, such as task scheduling and load balancing. Managing these factors is particularly challenging when workloads are irregular because their performance is input-sensitive. This paper presents HBC, the first compiler that translates C/C++ programs with high-level, fork-join constructs (e.g., OpenMP) to binaries capable of automatically controlling the cost of parallelism and dealing with irregular, input-sensitive workloads. The basis of our approach is Heartbeat Scheduling, a recent proposal for automatic granularity control, which is backed by formal guarantees on performance. HBC binaries outperform OpenMP binaries for workloads for which even entirely manual solutions struggle to find the right balance between parallelism and its costs.
UR - http://www.scopus.com/inward/record.url?scp=85192198908&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192198908&partnerID=8YFLogxK
U2 - 10.1145/3620665.3640405
DO - 10.1145/3620665.3640405
M3 - Conference contribution
AN - SCOPUS:85192198908
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 232
EP - 250
BT - Summer Cycle
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
Y2 - 27 April 2024 through 1 May 2024
ER -