A distributed multi-storage I/O system for data intensive scientific computing

Xiaohui Shen*, Alok Choudhary

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that cannot only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance. Although I/O optimization techniques can help improve performance, it also complicates I/O system. In addition, most optimization techniques have their limitations. Therefore, selecting accurate optimization policies requires expert knowledge which is not suitable for end users who may have little knowledge of I/O techniques. So the task of I/O optimization decision should be left to the I/O system itself, that is, automatic from user's point of view. We present a User Access Pattern data structure which is associated with each dataset that can help MS-I/O easily make accurate I/O optimization decisions.

Original languageEnglish (US)
Pages (from-to)1623-1643
Number of pages21
JournalParallel Computing
Volume29
Issue number11-12 SPEC.ISS.
DOIs
StatePublished - Nov 2003

Keywords

  • Access pattern
  • Data intensive computing
  • Multi-storage I/O system

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A distributed multi-storage I/O system for data intensive scientific computing'. Together they form a unique fingerprint.

Cite this