Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

Poornima Nookala, Peter Dinda, Kyle C. Hale, Kyle Chard, Ioan Raicu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Enabling efficient fine-grained task parallelism is a significant challenge for hardware platforms with increasingly many cores. Existing techniques do not scale to hundreds of threads due to the high cost of synchronization in concurrent data structures. To overcome these limitations we present XQueue, a novel lock-less concurrent queuing system with relaxed ordering semantics that is geared towards realizing scalability up to hundreds of concurrent threads. We demonstrate the scalability of XQueue using microbenchmarks and show that XQueue can deliver concurrent operations with latencies as low as 110 cycles at scales of up to 192 cores (up to 6900× improvement compared to traditional synchronization mechanisms) across our diverse hardware, including x86, ARM, and Power9. The reduced latency allows XQueue to provide orders of magnitude (3300×) better throughput that existing techniques. To evaluate the real-world benefits of XQueue, we integrated XQueue with LLVM OpenMP and evaluated five unmodified benchmarks from the Barcelona OpenMP Task Suite (BOTS) as well as a graph traversal benchmark from the GAP benchmark suite. We compared the XQueue-enabled LLVM OpenMP implementation with the native LLVM and GNU OpenMP versions. Using fine-grained task workloads, XQueue can deliver 4× to 6× speedup compared to native GNU OpenMP and LLVM OpenMP in many cases, with speedups as high as 116× in some cases.

Original languageEnglish (US)
Title of host publicationProceedings - 29th International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2021
PublisherIEEE Computer Society
ISBN (Electronic)9781665458382
DOIs
StatePublished - 2021
Event29th International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2021 - Houston, United States
Duration: Nov 3 2021Nov 5 2021

Publication series

NameProceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS
ISSN (Print)1526-7539

Conference

Conference29th International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2021
Country/TerritoryUnited States
CityHouston
Period11/3/2111/5/21

Keywords

  • concurrent data structures
  • fine-grained parallelism
  • lock-free
  • lock-less
  • parallel runtime
  • queues
  • tasks

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures'. Together they form a unique fingerprint.

Cite this