Paths to fast barrier synchronization on the node

Conor Hetland, Georgios Tziantzioulis, Brian Suchy, Michael Leonard, Jin Han, John Albers, Nikos Hardavellas, Peter A Dinda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Synchronization primitives like barriers heavily impact the performance of parallel programs. As core counts increase and granularity decreases, the value of enabling fast barriers increases. Through the evaluation of the performance of a variety of software implementations of barriers, we found the cost of software barriers to be on the order of tens of thousands of cycles on various incarnations of x64 hardware. We argue that reducing the latency of a barrier via hardware support will dramatically improve the performance of existing applications and runtimes, and would enable new execution models, including those which currently do not perform well on multicore machines. To support our argument, we first present the design, implementation, and evaluation of a barrier on the Intel HARP, a prototype that integrates an x64 processor and FPGA in the same package. This effort gives insight into the potential speed and compactness of hardware barriers, and suggests useful improvements to the HARP platform. Next, we turn to the processor itself and describe an x64 ISA extension for barriers, and how it could be implemented in the microarchitecture with minimal collateral changes. This design allows for barriers to be securely managed jointly between the OS and the application. Finally, we speculate on how barrier synchronization might be implemented on future photonics-based hardware.

Original languageEnglish (US)
Title of host publicationHPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages109-120
Number of pages12
ISBN (Electronic)9781450366700
DOIs
StatePublished - Jun 17 2019
Event28th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019 - Phoenix, United States
Duration: Jun 22 2019Jun 29 2019

Publication series

NameHPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference28th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019
CountryUnited States
CityPhoenix
Period6/22/196/29/19

Fingerprint

Synchronization
Hardware
Photonics
Field programmable gate arrays (FPGA)
Costs

Keywords

  • Collective communication
  • HPC
  • Parallel computing
  • Synchronization

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Cite this

Hetland, C., Tziantzioulis, G., Suchy, B., Leonard, M., Han, J., Albers, J., ... Dinda, P. A. (2019). Paths to fast barrier synchronization on the node. In HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (pp. 109-120). (HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing). Association for Computing Machinery, Inc. https://doi.org/10.1145/3307681.3325402
Hetland, Conor ; Tziantzioulis, Georgios ; Suchy, Brian ; Leonard, Michael ; Han, Jin ; Albers, John ; Hardavellas, Nikos ; Dinda, Peter A. / Paths to fast barrier synchronization on the node. HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, 2019. pp. 109-120 (HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing).
@inproceedings{b11b95a72c2946288951ff129161511e,
title = "Paths to fast barrier synchronization on the node",
abstract = "Synchronization primitives like barriers heavily impact the performance of parallel programs. As core counts increase and granularity decreases, the value of enabling fast barriers increases. Through the evaluation of the performance of a variety of software implementations of barriers, we found the cost of software barriers to be on the order of tens of thousands of cycles on various incarnations of x64 hardware. We argue that reducing the latency of a barrier via hardware support will dramatically improve the performance of existing applications and runtimes, and would enable new execution models, including those which currently do not perform well on multicore machines. To support our argument, we first present the design, implementation, and evaluation of a barrier on the Intel HARP, a prototype that integrates an x64 processor and FPGA in the same package. This effort gives insight into the potential speed and compactness of hardware barriers, and suggests useful improvements to the HARP platform. Next, we turn to the processor itself and describe an x64 ISA extension for barriers, and how it could be implemented in the microarchitecture with minimal collateral changes. This design allows for barriers to be securely managed jointly between the OS and the application. Finally, we speculate on how barrier synchronization might be implemented on future photonics-based hardware.",
keywords = "Collective communication, HPC, Parallel computing, Synchronization",
author = "Conor Hetland and Georgios Tziantzioulis and Brian Suchy and Michael Leonard and Jin Han and John Albers and Nikos Hardavellas and Dinda, {Peter A}",
year = "2019",
month = "6",
day = "17",
doi = "10.1145/3307681.3325402",
language = "English (US)",
series = "HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing",
publisher = "Association for Computing Machinery, Inc",
pages = "109--120",
booktitle = "HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing",

}

Hetland, C, Tziantzioulis, G, Suchy, B, Leonard, M, Han, J, Albers, J, Hardavellas, N & Dinda, PA 2019, Paths to fast barrier synchronization on the node. in HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, Inc, pp. 109-120, 28th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019, Phoenix, United States, 6/22/19. https://doi.org/10.1145/3307681.3325402

Paths to fast barrier synchronization on the node. / Hetland, Conor; Tziantzioulis, Georgios; Suchy, Brian; Leonard, Michael; Han, Jin; Albers, John; Hardavellas, Nikos; Dinda, Peter A.

HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, 2019. p. 109-120 (HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Paths to fast barrier synchronization on the node

AU - Hetland, Conor

AU - Tziantzioulis, Georgios

AU - Suchy, Brian

AU - Leonard, Michael

AU - Han, Jin

AU - Albers, John

AU - Hardavellas, Nikos

AU - Dinda, Peter A

PY - 2019/6/17

Y1 - 2019/6/17

N2 - Synchronization primitives like barriers heavily impact the performance of parallel programs. As core counts increase and granularity decreases, the value of enabling fast barriers increases. Through the evaluation of the performance of a variety of software implementations of barriers, we found the cost of software barriers to be on the order of tens of thousands of cycles on various incarnations of x64 hardware. We argue that reducing the latency of a barrier via hardware support will dramatically improve the performance of existing applications and runtimes, and would enable new execution models, including those which currently do not perform well on multicore machines. To support our argument, we first present the design, implementation, and evaluation of a barrier on the Intel HARP, a prototype that integrates an x64 processor and FPGA in the same package. This effort gives insight into the potential speed and compactness of hardware barriers, and suggests useful improvements to the HARP platform. Next, we turn to the processor itself and describe an x64 ISA extension for barriers, and how it could be implemented in the microarchitecture with minimal collateral changes. This design allows for barriers to be securely managed jointly between the OS and the application. Finally, we speculate on how barrier synchronization might be implemented on future photonics-based hardware.

AB - Synchronization primitives like barriers heavily impact the performance of parallel programs. As core counts increase and granularity decreases, the value of enabling fast barriers increases. Through the evaluation of the performance of a variety of software implementations of barriers, we found the cost of software barriers to be on the order of tens of thousands of cycles on various incarnations of x64 hardware. We argue that reducing the latency of a barrier via hardware support will dramatically improve the performance of existing applications and runtimes, and would enable new execution models, including those which currently do not perform well on multicore machines. To support our argument, we first present the design, implementation, and evaluation of a barrier on the Intel HARP, a prototype that integrates an x64 processor and FPGA in the same package. This effort gives insight into the potential speed and compactness of hardware barriers, and suggests useful improvements to the HARP platform. Next, we turn to the processor itself and describe an x64 ISA extension for barriers, and how it could be implemented in the microarchitecture with minimal collateral changes. This design allows for barriers to be securely managed jointly between the OS and the application. Finally, we speculate on how barrier synchronization might be implemented on future photonics-based hardware.

KW - Collective communication

KW - HPC

KW - Parallel computing

KW - Synchronization

UR - http://www.scopus.com/inward/record.url?scp=85069156763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069156763&partnerID=8YFLogxK

U2 - 10.1145/3307681.3325402

DO - 10.1145/3307681.3325402

M3 - Conference contribution

T3 - HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

SP - 109

EP - 120

BT - HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

PB - Association for Computing Machinery, Inc

ER -

Hetland C, Tziantzioulis G, Suchy B, Leonard M, Han J, Albers J et al. Paths to fast barrier synchronization on the node. In HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc. 2019. p. 109-120. (HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing). https://doi.org/10.1145/3307681.3325402