Region templates: Data representation and management for high-throughput image analysis

George Teodoro*, Tony Pan, Tahsin Kurc, Jun Kong, Lee Alex Donald Cooper, Scott Klasky, Joel Saltz

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13x. Finally, a processing rate of 11,730 4K x 4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1200 CPU cores). This computation rate enables studies with very large datasets.

Original languageEnglish (US)
Pages (from-to)589-610
Number of pages22
JournalParallel Computing
Volume40
Issue number10
DOIs
StatePublished - Dec 2014

Keywords

  • GPGPU
  • Heterogeneous environments
  • Image analysis
  • Microscopy imaging
  • Storage and I/O

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Region templates: Data representation and management for high-throughput image analysis'. Together they form a unique fingerprint.

Cite this