Hard real-time scheduling for parallel run-time systems

Peter A Dinda, Xiaoyang Wang, Jinghang Wang, Chris Beauchene, Conor Hetland

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.

Original languageEnglish (US)
Title of host publicationHPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages14-26
Number of pages13
ISBN (Electronic)9781450357852
DOIs
StatePublished - Jun 11 2018
Event27th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018 - Tempe, United States
Duration: Jun 11 2018Jun 15 2018

Publication series

NameHPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing

Other

Other27th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018
CountryUnited States
CityTempe
Period6/11/186/15/18

Fingerprint

Synchronization
Scheduling
Parallel processing systems
Data storage equipment
Real time systems
Hardware

Keywords

  • HPC
  • Hard real-time systems
  • Parallel computing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Cite this

Dinda, P. A., Wang, X., Wang, J., Beauchene, C., & Hetland, C. (2018). Hard real-time scheduling for parallel run-time systems. In HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing (pp. 14-26). (HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing). Association for Computing Machinery, Inc. https://doi.org/10.1145/3208040.3208052
Dinda, Peter A ; Wang, Xiaoyang ; Wang, Jinghang ; Beauchene, Chris ; Hetland, Conor. / Hard real-time scheduling for parallel run-time systems. HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, 2018. pp. 14-26 (HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing).
@inproceedings{518703a247cb4e72a5f43f01a046362c,
title = "Hard real-time scheduling for parallel run-time systems",
abstract = "High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.",
keywords = "HPC, Hard real-time systems, Parallel computing",
author = "Dinda, {Peter A} and Xiaoyang Wang and Jinghang Wang and Chris Beauchene and Conor Hetland",
year = "2018",
month = "6",
day = "11",
doi = "10.1145/3208040.3208052",
language = "English (US)",
series = "HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing",
publisher = "Association for Computing Machinery, Inc",
pages = "14--26",
booktitle = "HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing",

}

Dinda, PA, Wang, X, Wang, J, Beauchene, C & Hetland, C 2018, Hard real-time scheduling for parallel run-time systems. in HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing. HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, Inc, pp. 14-26, 27th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2018, Tempe, United States, 6/11/18. https://doi.org/10.1145/3208040.3208052

Hard real-time scheduling for parallel run-time systems. / Dinda, Peter A; Wang, Xiaoyang; Wang, Jinghang; Beauchene, Chris; Hetland, Conor.

HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, 2018. p. 14-26 (HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Hard real-time scheduling for parallel run-time systems

AU - Dinda, Peter A

AU - Wang, Xiaoyang

AU - Wang, Jinghang

AU - Beauchene, Chris

AU - Hetland, Conor

PY - 2018/6/11

Y1 - 2018/6/11

N2 - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.

AB - High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.

KW - HPC

KW - Hard real-time systems

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=85050120059&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050120059&partnerID=8YFLogxK

U2 - 10.1145/3208040.3208052

DO - 10.1145/3208040.3208052

M3 - Conference contribution

AN - SCOPUS:85050120059

T3 - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing

SP - 14

EP - 26

BT - HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing

PB - Association for Computing Machinery, Inc

ER -

Dinda PA, Wang X, Wang J, Beauchene C, Hetland C. Hard real-time scheduling for parallel run-time systems. In HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc. 2018. p. 14-26. (HPDC 2018 - Proceedings of the 2018 International Symposium on High-Performance Parallel and Distributed Computing). https://doi.org/10.1145/3208040.3208052