Scheduling threads for constructive cache sharing on CMPs

Shimin Chen*, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, Chris Wilkerson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

119 Scopus citations


In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this paper, we compare the performance of two state-of-the-art schedulers proposed for fine-grained multithreaded programs: Parallel Depth First (PDF), which is specifically designed for constructive cache sharing, and Work Stealing (WS), which is a more traditional design. Our experimental results indicate that PDF scheduling yields a 1.3 - 1.6X performance improvement relative to WS for several fine-grain parallel benchmarks on projected future CMP configurations; we also report several issues that may limit the advantage of PDF in certain applications. These results also indicate that PDF more effectively utilizes off-chip bandwidth, making it possible to trade-off on-chip cache for a larger number of cores. Moreover, we find that task granularity plays a key role in cache performance. Therefore, we present an automatic approach for selecting effective grain sizes, based on a new working set profiling algorithm that is an order of magnitude faster than previous approaches. This is the first paper demonstrating the effectiveness of PDF on real benchmarks, providing a direct comparison between PDF and WS, revealing the limiting factors for PDF in practice, and presenting an approach for overcoming these factors.

Original languageEnglish (US)
Title of host publicationSPAA'07
Subtitle of host publicationProceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
Number of pages11
StatePublished - 2007
EventSPAA'07: 19th Annual Symposium on Parallelism in Algorithms and Architectures - San Diego, CA, United States
Duration: Jun 9 2007Jun 11 2007

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures


OtherSPAA'07: 19th Annual Symposium on Parallelism in Algorithms and Architectures
Country/TerritoryUnited States
CitySan Diego, CA


  • Chip multiprocessors
  • Constructive cache sharing
  • Parallel depth first
  • Scheduling algorithms
  • Thread granularity
  • Work stealing
  • Working set profiling

ASJC Scopus subject areas

  • Engineering(all)


Dive into the research topics of 'Scheduling threads for constructive cache sharing on CMPs'. Together they form a unique fingerprint.

Cite this