TY - GEN

T1 - MPIPairwiseStatSig

T2 - 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010

AU - Agrawal, Ankit

AU - Misra, Sanchit

AU - Honbo, Daniel

AU - Choudhary, Alok

PY - 2010

Y1 - 2010

N2 - Sequence comparison is considered as a cornerstone application in bioinformatics, which forms the basis of many other applications. In particular, pairwise sequence alignment is a fundamental step in numerous sequence comparison based applications, where the typical purpose of pairwise sequence alignment step is homology detection, i.e., identifying related sequences. Estimation of statistical significance of a pairwise sequence alignment is crucial in homology detection. A recent development in the field is the use of pairwise statistical significance as an alternative to database statistical significance. Although pairwise statistical significance has been shown to be potentially superior than database statistical significance for homology detection (evaluated in terms of retrieval accuracy), currently it is much time consuming since it involves generating an empirical score distribution by aligning one sequence of the sequence-pair with N random shuffles of the other sequence. In this paper, we present a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI. Distributing the most compute-intensive portions of the pairwise statistical significance estimation procedure across multiple processors has been shown to result in near-linear speed-ups for the application.

AB - Sequence comparison is considered as a cornerstone application in bioinformatics, which forms the basis of many other applications. In particular, pairwise sequence alignment is a fundamental step in numerous sequence comparison based applications, where the typical purpose of pairwise sequence alignment step is homology detection, i.e., identifying related sequences. Estimation of statistical significance of a pairwise sequence alignment is crucial in homology detection. A recent development in the field is the use of pairwise statistical significance as an alternative to database statistical significance. Although pairwise statistical significance has been shown to be potentially superior than database statistical significance for homology detection (evaluated in terms of retrieval accuracy), currently it is much time consuming since it involves generating an empirical score distribution by aligning one sequence of the sequence-pair with N random shuffles of the other sequence. In this paper, we present a parallel algorithm for pairwise statistical significance estimation, called MPIPairwiseStatSig, implemented in C using MPI. Distributing the most compute-intensive portions of the pairwise statistical significance estimation procedure across multiple processors has been shown to result in near-linear speed-ups for the application.

KW - Experimentation

UR - http://www.scopus.com/inward/record.url?scp=78650010376&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650010376&partnerID=8YFLogxK

U2 - 10.1145/1851476.1851545

DO - 10.1145/1851476.1851545

M3 - Conference contribution

AN - SCOPUS:78650010376

SN - 9781605589428

T3 - HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

SP - 470

EP - 476

BT - HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Y2 - 21 June 2010 through 25 June 2010

ER -