Pairwise statistical significance of local sequence alignment using multiple parameter sets

Ankit Agrawal*, Xiaoqiu Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Preliminary results for a knowledge discovery application such as homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives significantly better coverage than using a single parameter set, at least at some error levels. Also, the fact that the performance does not degrade when using multiple parameter sets is a strong evidence that the assumption that the score distribution follows an extremevalue distribution is valid even when using multiple parameter sets. Results of pairwise statistical significance usingmultiple parameter sets are further shown to be significantly better than database statistical significance estimates reported by BLAST&PSI-BLAST,&comparable&at times significantly better than SSEARCH.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Pages53-59
Number of pages7
DOIs
StatePublished - 2008
Event2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: Oct 26 2008Oct 30 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other2nd International Workshop on Data and Text Mining in Bioinformatics, DTMBIO'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Country/TerritoryUnited States
CityNapa Valley, CA
Period10/26/0810/30/08

Keywords

  • Database statistical significance
  • Homologs
  • Pairwise statistical significance
  • Parameter set
  • Sequence alignment

ASJC Scopus subject areas

  • General Business, Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Pairwise statistical significance of local sequence alignment using multiple parameter sets'. Together they form a unique fingerprint.

Cite this