TY - JOUR
T1 - Efficient irregular wavefront propagation algorithms on hybrid CPU-GPU machines
AU - Teodoro, George
AU - Pan, Tony
AU - Kurc, Tahsin M.
AU - Kong, Jun
AU - Cooper, Lee Alex Donald
AU - Saltz, Joel H.
N1 - Funding Information:
The authors would like to express their gratitude to the anonymous reviewers for their valuable comments, which helped us to improve the presentation of this paper. This research was funded, in part, by grants from the National Institutes of Health through contract HHSN261200800001E by the National Cancer Institute; and contracts 5R01LM009239-04 and 1R01LM011119-01 from the National Library of Medicine, R24HL085343 from the National Heart Lung and Blood Institute, NIH NIBIB BISTI P20EB000591, RC4MD005964 from National Institutes of Health, and PHS Grant UL1TR000454 from the Clinical and Translational Science Award Program, National Institutes of Health, National Center for Advancing Translational Sciences. This research used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI-0910735. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We also want to thank Pavel Karas for releasing the SR_GPU implementation used in our comparative evaluation.
PY - 2013
Y1 - 2013
N2 - We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. Wedevelop and evaluate strategies for efficient computation and propagation of wavefronts using a multilevel queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with three GPUs and two multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.
AB - We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. Wedevelop and evaluate strategies for efficient computation and propagation of wavefronts using a multilevel queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with three GPUs and two multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.
KW - Cooperative CPU-GPU execution
KW - Euclidean distance transform
KW - GPGPU
KW - Heterogeneous environments
KW - Irregular wavefront propagation pattern
KW - Morphological reconstruction
UR - http://www.scopus.com/inward/record.url?scp=84884853346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84884853346&partnerID=8YFLogxK
U2 - 10.1016/j.parco.2013.03.001
DO - 10.1016/j.parco.2013.03.001
M3 - Article
C2 - 23908562
AN - SCOPUS:84884853346
SN - 0167-8191
VL - 39
SP - 189
EP - 211
JO - Parallel Computing
JF - Parallel Computing
IS - 4-5
ER -