h5bench: A unified benchmark suite for evaluating HDF5 I/O performance on pre-exascale platforms

Jean Luca Bez*, Houjun Tang, Scot Breitenfeld, Huihuo Zheng, Wei Keng Liao, Kaiyuan Hou, Zanhua Huang, Suren Byna

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high-performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real-world applications, we have created h5bench, a set of I/O kernels that exercise hierarchical data format version 5 (HDF5) I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library's heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (one-dimensional arrays, two-dimensional meshes, three-dimensional cubes), I/O modes (synchronous and asynchronous). In this paper, we present the observed performance of h5bench executed along several of these dimensions on existing supercomputers (Cori and Summit) and pre-exascale platforms (Perlmutter, Theta, and Polaris). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.

Original languageEnglish (US)
Article numbere8046
JournalConcurrency and Computation: Practice and Experience
Volume36
Issue number16
DOIs
StatePublished - Jul 25 2024

Funding

This research was supported by the Exascale Computing Project (17\u2010SC\u201020\u2010SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This work was supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE\u2010AC02\u201006CH11357. This research used resources of the NERSC, a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE\u2010AC02\u201005CH11231 using NERSC award ASCR\u2010ERCAP0021411. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE\u2010AC02\u201006CH11357. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE\u2010AC05\u201000OR22725.

Keywords

  • HDF5
  • I/O access patterns
  • I/O benchmarks
  • I/O performance

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'h5bench: A unified benchmark suite for evaluating HDF5 I/O performance on pre-exascale platforms'. Together they form a unique fingerprint.

Cite this