TY - GEN
T1 - Supporting computational data model representation with high-performance I/O in parallel netCDF
AU - Gao, Kui
AU - Jin, Chen
AU - Choudhary, Alok
AU - Liao, Wei Keng
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - Parallel computational scientific applications have been described by their computation and communication patterns. From a storage and I/O perspective, these applications can also be grouped into separate data models based on the way data is organized and accessed during simulation, analysis, and visualization. Parallel netCDF is a popular library used in many scientific applications to store scientific datasets and provides high-performance parallel I/O. Although the metadata-rich netCDF file format can effectively store and describe regular multi-dimensional array datasets, it does not address the full range of current and future computational science data models. In this paper, we present a new storage scheme in Parallel netCDF to represent a broad variety of data models used in modern computational scientific applications. This scheme also allows concurrent metadata construction for different data objects from multiple groups of application processes, an important feature in obtaining a high degree of I/O parallelism for data models exhibiting irregular data distribution. Furthermore, we employ non-blocking I/O functions to aggregate irregularly distributed data requests into large, contiguous data requests, to achieve high-performance I/O. Using an example of adaptive mesh refinement data model, we demonstrate the proposed scheme can produce scalable performance results for both data and metadata creation and access.
AB - Parallel computational scientific applications have been described by their computation and communication patterns. From a storage and I/O perspective, these applications can also be grouped into separate data models based on the way data is organized and accessed during simulation, analysis, and visualization. Parallel netCDF is a popular library used in many scientific applications to store scientific datasets and provides high-performance parallel I/O. Although the metadata-rich netCDF file format can effectively store and describe regular multi-dimensional array datasets, it does not address the full range of current and future computational science data models. In this paper, we present a new storage scheme in Parallel netCDF to represent a broad variety of data models used in modern computational scientific applications. This scheme also allows concurrent metadata construction for different data objects from multiple groups of application processes, an important feature in obtaining a high degree of I/O parallelism for data models exhibiting irregular data distribution. Furthermore, we employ non-blocking I/O functions to aggregate irregularly distributed data requests into large, contiguous data requests, to achieve high-performance I/O. Using an example of adaptive mesh refinement data model, we demonstrate the proposed scheme can produce scalable performance results for both data and metadata creation and access.
KW - Data Model
KW - Parallel I/O
KW - Parallel netCDF
UR - http://www.scopus.com/inward/record.url?scp=84858025722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858025722&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2011.6152746
DO - 10.1109/HiPC.2011.6152746
M3 - Conference contribution
AN - SCOPUS:84858025722
SN - 9781457719516
T3 - 18th International Conference on High Performance Computing, HiPC 2011
BT - 18th International Conference on High Performance Computing, HiPC 2011
PB - IEEE Computer Society
T2 - 18th International Conference on High Performance Computing, HiPC 2011
Y2 - 18 December 2011 through 21 December 2011
ER -