Abstract
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that cannot only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance. Although I/O optimization techniques can help improve performance, it also complicates I/O system. In addition, most optimization techniques have their limitations. Therefore, selecting accurate optimization policies requires expert knowledge which is not suitable for end users who may have little knowledge of I/O techniques. So the task of I/O optimization decision should be left to the I/O system itself, that is, automatic from user's point of view. We present a User Access Pattern data structure which is associated with each dataset that can help MS-I/O easily make accurate I/O optimization decisions.
Original language | English (US) |
---|---|
Pages (from-to) | 1623-1643 |
Number of pages | 21 |
Journal | Parallel Computing |
Volume | 29 |
Issue number | 11-12 SPEC.ISS. |
DOIs | |
State | Published - Nov 2003 |
Funding
This research was in part supported by Department of Energy under the Accelerated Strategic Computing Initiative (ASCI) Academic Strategic Alliance Program (ASAP) Level 2, under subcontract no. W-7405-ENG-48 from Lawrence Livermore National Laboratories. We would like to thank Regan Moore and Mike Wan of SDSC for helping us with the usage of SRB. We thank Mike Gleicher and Tom Sherwin of SDSC for answering our HPSS questions.
Keywords
- Access pattern
- Data intensive computing
- Multi-storage I/O system
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence