TY - GEN
T1 - Hard real-time scheduling for parallel run-time systems
AU - Dinda, Peter A
AU - Wang, Xiaoyang
AU - Wang, Jinghang
AU - Beauchene, Chris
AU - Hetland, Conor
PY - 2018/6/11
Y1 - 2018/6/11
N2 - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.
AB - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.
KW - HPC
KW - Hard real-time systems
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85050120059&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050120059&partnerID=8YFLogxK
U2 - 10.1145/3208040.3208052
DO - 10.1145/3208040.3208052
M3 - Conference contribution
AN - SCOPUS:85050120059
T3 - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing
SP - 14
EP - 26
BT - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery, Inc
T2 - 27th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018
Y2 - 11 June 2018 through 15 June 2018
ER -