TY - GEN
T1 - Task parallel assembly language for uncompromising parallelism
AU - Rainey, Mike
AU - Newton, Ryan R.
AU - Hale, Kyle
AU - Hardavellas, Nikos
AU - Campanoni, Simone
AU - Dinda, Peter
AU - Acar, Umut A.
N1 - Funding Information:
ported by the National Science Foundation under grant numbers CCF-1901381 and 2028921.
Funding Information:
R. Newton. This work was made possible by support from the National Science Foundation via awards CCF-2127277 and SPX-1725679.
Funding Information:
S. Campanoni. This work was made possible by support from the National Science Foundation via award CCF-1908488 (also, CNS-1763743, and CCF-2028851).
Funding Information:
K. Hale. This work was made possible by support from the National Science Foundation via awards CNS-1763612, CNS-1718252, and CCF-2028958.
Funding Information:
sible by support from the National Science Foundation via awards CCF-1533560, CNS-1763743, and CCF-2028851.
Publisher Copyright:
© 2021 ACM.
PY - 2021/6/18
Y1 - 2021/6/18
N2 - Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.
AB - Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.
KW - granularity control
KW - parallel programming languages
UR - http://www.scopus.com/inward/record.url?scp=85108914971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108914971&partnerID=8YFLogxK
U2 - 10.1145/3453483.3460969
DO - 10.1145/3453483.3460969
M3 - Conference contribution
AN - SCOPUS:85108914971
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 1064
EP - 1079
BT - PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
A2 - Freund, Stephen N.
A2 - Yahav, Eran
PB - Association for Computing Machinery
T2 - 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021
Y2 - 20 June 2021 through 25 June 2021
ER -