TY - GEN
T1 - Exploring I/O strategies for parallel sequence-search tools with S3aSim
AU - Chingt, Avery
AU - Feng, Wu Chun
AU - Lin, Heshan
AU - Xiaosong, Ma
AU - Choudhary, Alok
PY - 2006
Y1 - 2006
N2 - Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.
AB - Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases will emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=33845901845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845901845&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33845901845
SN - 1424403073
SN - 9781424403073
T3 - Proceedings of the IEEE International Symposium on High Performance Distributed Computing
SP - 229
EP - 240
BT - Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15
T2 - 15th IEEE International Symposium on High Performance Distributed Computing, HPDC-15
Y2 - 19 June 2006 through 23 June 2006
ER -