High-performance scientific data management system

Jaechun No, Rajeev Thakur*, Alok Choudhary

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.

Original languageEnglish (US)
Pages (from-to)434-447
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume63
Issue number4
DOIs
StatePublished - Apr 1 2003

Keywords

  • Database
  • MPI-IO
  • Metadata
  • Parallel I/O
  • Scientific data management

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'High-performance scientific data management system'. Together they form a unique fingerprint.

Cite this