Pairwise statistical significance versus database statistical significance for local alignment of protein sequences

Ankit Agrawal*, Volker Brendel, Xiaoqiu Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

An important aspect of pairwise sequence comparison is assessing the statistical significance of the alignment. Most of the currently popular alignment programs report the statistical significance of an alignment in context of a database search. This database statistical significance is dependent on the database, and hence, the same alignment of a pair of sequences may be assessed different statistical significance values in different databases. In this paper, we explore the use of pairwise statistical significance, which is independent of any database, and can be useful in cases where we only have a pair of sequences and we want to comment on the relatedness of the sequences, independent of any database. We compared different methods and determined that censored maximum likelihood fitting the score distribution right of the peak is the most accurate method for estimating pairwise statistical significance. We evaluated this method in an experiment with a subset of CATH2.3, which had been previoulsy used by other authors as a benchmark data set for protein comparison. Comparison of results with database statistical significance reported by popular programs like SSEARCH and PSI-BLAST indicate that the results of pairwise statistical significance are comparable, indeed sometimes significantly better than those of database statistical significance (with SSEARCH). However, PSI-BLAST performs best, presumably due to its use of query-specific substitution matrices.

Original languageEnglish (US)
Title of host publicationBioinformatics Research and Applications - Fourth International Symposium, ISBRA 2008, Proceedings
Pages50-61
Number of pages12
DOIs
StatePublished - Aug 27 2008
Event4th International Symposium on Bioinformatics Research and Applications, ISBRA 2008 - Atlanta, GA, United States
Duration: May 6 2008May 9 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4983 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other4th International Symposium on Bioinformatics Research and Applications, ISBRA 2008
Country/TerritoryUnited States
CityAtlanta, GA
Period5/6/085/9/08

Keywords

  • Database statistical significance
  • Homologs
  • Pairwise local alignment
  • Pairwise statistical significance

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Pairwise statistical significance versus database statistical significance for local alignment of protein sequences'. Together they form a unique fingerprint.

Cite this