Abstract
Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Preliminary results for a knowledge discovery application such as homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives significantly better coverage than using a single parameter set, at least at some error levels. Also, the fact that the performance does not degrade when using multiple parameter sets is a strong evidence that the assumption that the score distribution follows an extremevalue distribution is valid even when using multiple parameter sets. Results of pairwise statistical significance usingmultiple parameter sets are further shown to be significantly better than database statistical significance estimates reported by BLAST&PSI-BLAST,&comparable&at times significantly better than SSEARCH.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08 |
Pages | 53-59 |
Number of pages | 7 |
DOIs | |
State | Published - Dec 1 2008 |
Event | 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States Duration: Oct 26 2008 → Oct 30 2008 |
Other
Other | 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08 |
---|---|
Country | United States |
City | Napa Valley, CA |
Period | 10/26/08 → 10/30/08 |
Keywords
- Database statistical significance
- Homologs
- Pairwise statistical significance
- Parameter set
- Sequence alignment
ASJC Scopus subject areas
- Decision Sciences(all)
- Business, Management and Accounting(all)