PeerWave: Exploiting wavefront parallelism on GPUs with peer-SM synchronization

Mehmet E. Belviranli, Peng Deng, Laxmi N. Bhuyan, Rajiv Gupta, Qi Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations


Nested loops with regular iteration dependencies span a large class of applications ranging from string matching to linear system solvers. Wavefront parallelism is a well-known technique to enable concurrent processing of such applications and is widely being used on GPUs to benefit from their massively parallel computing capabilities. Wavefront parallelism on GPUs uses global barriers between processing of tiles to enforce data dependencies. However, such diagonal-wide synchronization causes load imbalance by forcing SMs to wait for the completion of the SM with longest computation. Moreover, diagonal processing causes loss of locality due to elements that border adjacent tiles. In this paper, we propose PeerWave, an alternative GPU wavefront parallelization technique that improves inter-SM load balance by using peer-wise synchronization between SMs. and eliminating global synchronization. Our approach also increases GPU L2 cache locality through row allocation of tiles to the SMs. We further improve PeerWave performance by using exible hyper-tiles that reduce inter-SM wait time while maximizing intra-SM utilization. We develop an analytical model for determining the optimal tile size. Finally, we present a run-time and a CUDA based API to allow users to easily implement their applications using PeerWave. We evaluate PeerWave on the NVIDIA K40c GPU using 6 different applications and achieve speedups of up to 2X compared to the most recent hyperplane transformation based GPU.

Original languageEnglish (US)
Title of host publicationICS 2015 - Proceedings of the 29th ACM International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Number of pages11
ISBN (Electronic)9781450335591
StatePublished - Jun 8 2015
Event29th ACM International Conference on Supercomputing, ICS 2015 - Newport Beach, United States
Duration: Jun 8 2015Jun 11 2015

Publication series

NameProceedings of the International Conference on Supercomputing


Other29th ACM International Conference on Supercomputing, ICS 2015
Country/TerritoryUnited States
CityNewport Beach


  • Decentralized synchronization
  • GP-GPU computing
  • Wavefront parallelism

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'PeerWave: Exploiting wavefront parallelism on GPUs with peer-SM synchronization'. Together they form a unique fingerprint.

Cite this