TY - JOUR
T1 - Design and implementation of a parallel I/O runtime system for irregular applications
AU - No, Jaechun
AU - Park, Sung soon
AU - Carretero, Jesus
AU - Choudhary, Alok
AU - Chen, Pang
N1 - Funding Information:
1This work was supported in part by Sandia National Labs award AV-6193 under the ASCI program, and in part by NSF Young Investigator Award CCR-9357840 and NSF CCR-9509143.
PY - 1998
Y1 - 1998
N2 - In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two models, namely, `Collective I/O' and `Pipelined Collective I/O'. In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. Both models have been optimized by using software caching, chunking and on-line compression mechanisms. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine at Sandia National Labs.
AB - In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two models, namely, `Collective I/O' and `Pipelined Collective I/O'. In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. Both models have been optimized by using software caching, chunking and on-line compression mechanisms. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine at Sandia National Labs.
UR - http://www.scopus.com/inward/record.url?scp=0031704395&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031704395&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:0031704395
SN - 1063-7133
SP - 280
EP - 284
JO - Proceedings of the International Parallel Processing Symposium, IPPS
JF - Proceedings of the International Parallel Processing Symposium, IPPS
T2 - Proceedings of the 1998 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing
Y2 - 30 March 1998 through 3 April 1998
ER -