Estimating pairwise statistical significance of protein local alignments using a clustering-classification approach based on amino acid composition

Ankit Agrawal*, Arka Ghosh, Xiaoqiu Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

A central question in pairwise sequence comparison is assessing the statistical significance of the alignment. The alignment score distribution is known to follow an extreme value distribution with analytically calculable parameters K and λ for ungapped alignments with one substitution matrix. But no statistical theory is currently available for the gapped case and for alignments using multiple scoring matrices, although their score distribution is known to closely follow extreme value distribution and the corresponding parameters can be estimated by simulation. Ideal estimation would require simulation for each sequence pair, which is impractical. In this paper, we present a simple clustering-classification approach based on amino acid composition to estimate K and λ for a given sequence pair and scoring scheme, including using multiple parameter sets. The resulting set of K and λ for different cluster pairs has large variability even for the same scoring scheme, underscoring the heavy dependence of K and λ on the amino acid composition. The proposed approach in this paper is an attempt to separate the influence of amino acid composition in estimation of statistical significance of pairwise protein alignments. Experiments and analysis of other approaches to estimate statistical parameters also indicate that the methods used in this work estimate the statistical significance with good accuracy.

Original languageEnglish (US)
Title of host publicationBioinformatics Research and Applications - Fourth International Symposium, ISBRA 2008, Proceedings
Pages62-73
Number of pages12
DOIs
StatePublished - Aug 27 2008
Event4th International Symposium on Bioinformatics Research and Applications, ISBRA 2008 - Atlanta, GA, United States
Duration: May 6 2008May 9 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4983 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other4th International Symposium on Bioinformatics Research and Applications, ISBRA 2008
CountryUnited States
CityAtlanta, GA
Period5/6/085/9/08

Keywords

  • Classification
  • Clustering
  • Pairwise local alignment
  • Statistical significance

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Estimating pairwise statistical significance of protein local alignments using a clustering-classification approach based on amino acid composition'. Together they form a unique fingerprint.

  • Cite this

    Agrawal, A., Ghosh, A., & Huang, X. (2008). Estimating pairwise statistical significance of protein local alignments using a clustering-classification approach based on amino acid composition. In Bioinformatics Research and Applications - Fourth International Symposium, ISBRA 2008, Proceedings (pp. 62-73). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4983 LNBI). https://doi.org/10.1007/978-3-540-79450-9_7