Abstract
Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high-performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real-world applications, we have created h5bench, a set of I/O kernels that exercise hierarchical data format version 5 (HDF5) I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library's heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (one-dimensional arrays, two-dimensional meshes, three-dimensional cubes), I/O modes (synchronous and asynchronous). In this paper, we present the observed performance of h5bench executed along several of these dimensions on existing supercomputers (Cori and Summit) and pre-exascale platforms (Perlmutter, Theta, and Polaris). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.
Original language | English (US) |
---|---|
Article number | e8046 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 36 |
Issue number | 16 |
DOIs | |
State | Published - Jul 25 2024 |
Funding
This research was supported by the Exascale Computing Project (17\u2010SC\u201020\u2010SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This work was supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE\u2010AC02\u201006CH11357. This research used resources of the NERSC, a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE\u2010AC02\u201005CH11231 using NERSC award ASCR\u2010ERCAP0021411. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE\u2010AC02\u201006CH11357. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE\u2010AC05\u201000OR22725.
Keywords
- HDF5
- I/O access patterns
- I/O benchmarks
- I/O performance
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics