TY - GEN
T1 - Using subfiling to improve programming flexibility and performance of parallel shared-file I/O
AU - Gao, Kui
AU - Liao, Wei Keng
AU - Nisar, Arifa
AU - Choudhary, Alok
AU - Ross, Robert
AU - Latham, Robert
PY - 2009/12/1
Y1 - 2009/12/1
N2 - There are two popular parallel I/O programming styles used by modern scientific computational applications: unique-file and shared-file. Unique-file I/O usually gives satisfactory performance, but its major drawback is that managing a large number of files can overwhelm the task of postsimulation data processing. Shared-file I/O produces fewer files and allows arrays partitioned among processes to be saved in the canonical order. As the number of processors on modern parallel machines increases into thousands and more, the problem size and in turn the global array size also increase proportionally. It is not practical to manage files of size each larger than a few hundreds of GB. Hence, to seek a middle ground between these two I/O styles, we propose a subfiling scheme that divides a large multi-dimensional global array into smaller subarrays, each saved in a smaller file, named subfile. Subfiling is implemented on top of MPI-IO. We also incorporate it into the parallel netCDF library in order to preserve the partitioning information in the netCDF file header, so that the global array can later be reconstructed. In addition, since the subfiling scheme decreases the number of processes sharing a file, it can reduce the overhead of file system's data consistency control. Our experimental results with several I/O benchmarks show that subfiling can provide improved I/O performance.
AB - There are two popular parallel I/O programming styles used by modern scientific computational applications: unique-file and shared-file. Unique-file I/O usually gives satisfactory performance, but its major drawback is that managing a large number of files can overwhelm the task of postsimulation data processing. Shared-file I/O produces fewer files and allows arrays partitioned among processes to be saved in the canonical order. As the number of processors on modern parallel machines increases into thousands and more, the problem size and in turn the global array size also increase proportionally. It is not practical to manage files of size each larger than a few hundreds of GB. Hence, to seek a middle ground between these two I/O styles, we propose a subfiling scheme that divides a large multi-dimensional global array into smaller subarrays, each saved in a smaller file, named subfile. Subfiling is implemented on top of MPI-IO. We also incorporate it into the parallel netCDF library in order to preserve the partitioning information in the netCDF file header, so that the global array can later be reconstructed. In addition, since the subfiling scheme decreases the number of processes sharing a file, it can reduce the overhead of file system's data consistency control. Our experimental results with several I/O benchmarks show that subfiling can provide improved I/O performance.
KW - MPI-IO
KW - Parallel netCDF
KW - Subfiling
UR - http://www.scopus.com/inward/record.url?scp=77951432308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951432308&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2009.68
DO - 10.1109/ICPP.2009.68
M3 - Conference contribution
AN - SCOPUS:77951432308
SN - 9780769538020
T3 - Proceedings of the International Conference on Parallel Processing
SP - 470
EP - 477
BT - ICPP-2009 - The 38th International Conference on Parallel Processing
T2 - 38th International Conference on Parallel Processing, ICPP-2009
Y2 - 22 September 2009 through 25 September 2009
ER -