TY - GEN
T1 - Hard real-time scheduling for parallel run-time systems
AU - Dinda, Peter A
AU - Wang, Xiaoyang
AU - Wang, Jinghang
AU - Beauchene, Chris
AU - Hetland, Conor
N1 - Funding Information:
This project is made possible by support from the United States National Science Foundation through grant CCF-1533560 and from Sandia National Laboratories through the Hobbes Project, which is funded by the 2013 Exascale Operating and Runtime Systems Program under the Office of Advanced Scientific Computing Research in the United States Department of Energy’s Office of Science.
Publisher Copyright:
© 2018 Copyright held by the owner/author(s).
PY - 2018/6/11
Y1 - 2018/6/11
N2 - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.
AB - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.
KW - HPC
KW - Hard real-time systems
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85050120059&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050120059&partnerID=8YFLogxK
U2 - 10.1145/3208040.3208052
DO - 10.1145/3208040.3208052
M3 - Conference contribution
AN - SCOPUS:85050120059
T3 - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing
SP - 14
EP - 26
BT - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery, Inc
T2 - 27th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018
Y2 - 11 June 2018 through 15 June 2018
ER -