TY - GEN
T1 - Indexing genomic sequences on the IBM Blue Gene
AU - Ghoting, Amol
AU - Makarychev, Konstantin
PY - 2009
Y1 - 2009
N2 - With advances in sequencing technology and through aggressive sequencing efforts, DNA sequence data sets have been growing at a rapid pace. To gain from these advances, it is important to provide life science researchers with the ability to process and query large sequence data sets. For the past three decades, the suffix tree has served as a fundamental data structure in processing sequential data sets. However, tree construction times on large data sets have been excessive. While parallel suffix tree construction is an obvious solution to reduce execution times, poor locality of reference has limited parallel performance. In this paper, we show that through careful parallel algorithm design, this limitation can be removed, allowing tree construction to scale to massively parallel systems like the IBM Blue Gene. We demonstrate that the entire Human genome can be indexed on 1024 processors in under 15 minutes.
AB - With advances in sequencing technology and through aggressive sequencing efforts, DNA sequence data sets have been growing at a rapid pace. To gain from these advances, it is important to provide life science researchers with the ability to process and query large sequence data sets. For the past three decades, the suffix tree has served as a fundamental data structure in processing sequential data sets. However, tree construction times on large data sets have been excessive. While parallel suffix tree construction is an obvious solution to reduce execution times, poor locality of reference has limited parallel performance. In this paper, we show that through careful parallel algorithm design, this limitation can be removed, allowing tree construction to scale to massively parallel systems like the IBM Blue Gene. We demonstrate that the entire Human genome can be indexed on 1024 processors in under 15 minutes.
UR - http://www.scopus.com/inward/record.url?scp=74049134928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74049134928&partnerID=8YFLogxK
U2 - 10.1145/1654059.1654122
DO - 10.1145/1654059.1654122
M3 - Conference contribution
AN - SCOPUS:74049134928
SN - 9781605587448
T3 - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
BT - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
T2 - Conference on High Performance Computing Networking, Storage and Analysis, SC '09
Y2 - 14 November 2009 through 20 November 2009
ER -