TY - GEN
T1 - Pairwise statistical significance of local sequence alignment using multiple parameter sets
AU - Agrawal, Ankit
AU - Huang, Xiaoqiu
PY - 2008
Y1 - 2008
N2 - Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Preliminary results for a knowledge discovery application such as homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives significantly better coverage than using a single parameter set, at least at some error levels. Also, the fact that the performance does not degrade when using multiple parameter sets is a strong evidence that the assumption that the score distribution follows an extremevalue distribution is valid even when using multiple parameter sets. Results of pairwise statistical significance usingmultiple parameter sets are further shown to be significantly better than database statistical significance estimates reported by BLAST&PSI-BLAST,&comparable&at times significantly better than SSEARCH.
AB - Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Preliminary results for a knowledge discovery application such as homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives significantly better coverage than using a single parameter set, at least at some error levels. Also, the fact that the performance does not degrade when using multiple parameter sets is a strong evidence that the assumption that the score distribution follows an extremevalue distribution is valid even when using multiple parameter sets. Results of pairwise statistical significance usingmultiple parameter sets are further shown to be significantly better than database statistical significance estimates reported by BLAST&PSI-BLAST,&comparable&at times significantly better than SSEARCH.
KW - Database statistical significance
KW - Homologs
KW - Pairwise statistical significance
KW - Parameter set
KW - Sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=70349231370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349231370&partnerID=8YFLogxK
U2 - 10.1145/1458449.1458462
DO - 10.1145/1458449.1458462
M3 - Conference contribution
AN - SCOPUS:70349231370
SN - 9781605582511
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 53
EP - 59
BT - Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
T2 - 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Y2 - 26 October 2008 through 30 October 2008
ER -