Improving MPI Collective I/O for High Volume Non-Contiguous Requests with Intra-Node Aggregation

Qiao Kang*, Sunwoo Lee, Kaiyuan Hou, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei Keng Liao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication contention when redistributing the I/O requests. We evaluate the performance and compare it with the original two-phase I/O on Cray XC40 parallel computers (Theta and Cori) with Intel KNL and Haswell processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show our proposed method effectively reduces the communication cost and hence maintains the scalability for a large number of processes.

Original languageEnglish (US)
Article number9109678
Pages (from-to)2682-2695
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number11
DOIs
StatePublished - Nov 1 2020

Keywords

  • MPI collective I/O
  • Parallel I/O
  • non-contiguous I/O
  • two-phase I/O

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Improving MPI Collective I/O for High Volume Non-Contiguous Requests with Intra-Node Aggregation'. Together they form a unique fingerprint.

Cite this