Abstract
Accessing non-contiguous blocks in multiple array variables is a challenging I/O pattern for parallel applications to obtain good I/O performance. High-level I/O libraries such as HDF5 allow users to implement this pattern conveniently, but users have observed significant performance bottlenecks in the two-phase I/O implementation of MPI-IO. Recent studies have advanced the two-phase I/O performance by novel communication algorithms, but such improvements still have limitations. Two-phase I/O has to faithfully process inputs from high-level I/O libraries, so that implementation overheads can accumulate for improper usage of high-level I/O libraries. In this paper, we propose approaches for efficient usage of high-level I/O libraries that can circumvent major collective I/O overheads. We adopt a multi-dataset implementation of HDF5 dataset I/O to aggregate non-contiguous requests for array blocks and provide corresponding parameter assignment strategies. These approaches reduce the overheads caused by communication straggler effects in two-phase I/O. We show that our proposed methods can improve the parallel I/O performance up to 8× on two supercomputing systems for the HDF5 implementations of an I/O kernel extracted from climate simulation code compared with its baseline implementations.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
Editors | Yixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 98-108 |
Number of pages | 11 |
ISBN (Electronic) | 9781665439022 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States Duration: Dec 15 2021 → Dec 18 2021 |
Publication series
Name | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|
Conference
Conference | 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 12/15/21 → 12/18/21 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357.
ASJC Scopus subject areas
- Information Systems and Management
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Information Systems