TY - JOUR
T1 - Region templates
T2 - Data representation and management for high-throughput image analysis
AU - Teodoro, George
AU - Pan, Tony
AU - Kurc, Tahsin
AU - Kong, Jun
AU - Cooper, Lee Alex Donald
AU - Klasky, Scott
AU - Saltz, Joel
N1 - Funding Information:
This work was supported in part by HHSN261200800001E and 1U24CA180924-01A1 from the NCI , R24HL085343 from the NHLBI , R01LM011119-01 and R01LM009239 from the NLM , RC4MD005964 from the NIH , PHS UL1RR025008 from the NIH CTSA , and CNPq . This work is supported in part by the NIH K25CA181503 . This research used resources provided by the XSEDE Science Gateways program and the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the NSF under Contract OCI-0910735.
PY - 2014/12
Y1 - 2014/12
N2 - We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13x. Finally, a processing rate of 11,730 4K x 4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1200 CPU cores). This computation rate enables studies with very large datasets.
AB - We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13x. Finally, a processing rate of 11,730 4K x 4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1200 CPU cores). This computation rate enables studies with very large datasets.
KW - GPGPU
KW - Heterogeneous environments
KW - Image analysis
KW - Microscopy imaging
KW - Storage and I/O
UR - http://www.scopus.com/inward/record.url?scp=84908545293&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908545293&partnerID=8YFLogxK
U2 - 10.1016/j.parco.2014.09.003
DO - 10.1016/j.parco.2014.09.003
M3 - Article
C2 - 26139953
AN - SCOPUS:84908545293
VL - 40
SP - 589
EP - 610
JO - Parallel Computing
JF - Parallel Computing
SN - 0167-8191
IS - 10
ER -