Integrating parallel file I/O and database support for high-performance scientific data management

Jaechun No, Rajeev Thakur, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions are used for this problem: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that aims to combine the good features of both file I/O and databases. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. SDM takes advantage of various I/O optimizations available in MPI-IO, such as collective I/O and noncontiguous requests, in a manner that is transparent to the user. As a result, users can write and retrieve data with the performance of parallel file I/O, without having to bother with the details of actually performing file I/O. In this paper, we describe the design and implementation of SDM. With the help of two parallel application templates, ASTRO3D and an Euler solver, we illustrate how some of the design criteria affect performance.

Original languageEnglish (US)
Title of host publicationSC 2000 - Proceedings of the 2000 ACM/IEEE Conference on Supercomputing
PublisherAssociation for Computing Machinery
ISBN (Electronic)0780398025
DOIs
StatePublished - 2000
Externally publishedYes
Event2000 ACM/IEEE Conference on Supercomputing, SC 2000 - Dallas, United States
Duration: Nov 4 2000Nov 10 2000

Publication series

NameProceedings of the International Conference on Supercomputing
Volume2000-November

Conference

Conference2000 ACM/IEEE Conference on Supercomputing, SC 2000
Country/TerritoryUnited States
CityDallas
Period11/4/0011/10/00

Funding

This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Integrating parallel file I/O and database support for high-performance scientific data management'. Together they form a unique fingerprint.

Cite this