TY - GEN
T1 - Sequence-specific sequence comparison using pairwise statistical significance
AU - Agrawal, Ankit
AU - Choudhary, Alok
AU - Huang, Xiaoqiu
N1 - Funding Information:
The authors would like to thank Dr. Sean Eddy for making the C routines of censored maximum likelihood fitting available online, Dr. William R. Pearson for making the benchmark protein comparison database available online, and Dr. Volker Brendel for helpful discussions and providing links to the data. This work was supported in part by NSF grants CNS-0551639, IIS-0536994, NSF HECURA CCF-0621443, and NSF SDCI OCI-0724599, NSF IIS-0905205, DOE FASTOS award number DE-FG02-08ER25848 and DOE SCIDAC-2: Scientific Data Management Center for Enabling Technologies (CET) grant DE-FC02-07ER25808.
PY - 2011
Y1 - 2011
N2 - There has been a deluge of biological sequence data in the public domain, which makes sequence comparison one of the most fundamental computational problems in bioinformatics. The biologists routinely use pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is a well-known fact that almost everything in bioinformatics depends on the inter-relationship between sequence, structure, and function (all encapsulated in the term relatedness), which is far from being well understood. The potential relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This chapter presents a summary of recent advances in accurately estimating statistical significance of pairwise local alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence specific. Comparison of using pairwise statistical significance to rank database sequences, with well-known database search programs like BLAST, PSI-BLAST, and SSEARCH, is also presented. As expected, the sequence-comparison performance (evaluated in terms of retrieval accuracy) improves significantly as the sequence comparison process is made more and more sequence specific. Shortcomings of currently used approaches and some potentially useful directions for future work are also presented.
AB - There has been a deluge of biological sequence data in the public domain, which makes sequence comparison one of the most fundamental computational problems in bioinformatics. The biologists routinely use pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is a well-known fact that almost everything in bioinformatics depends on the inter-relationship between sequence, structure, and function (all encapsulated in the term relatedness), which is far from being well understood. The potential relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This chapter presents a summary of recent advances in accurately estimating statistical significance of pairwise local alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence specific. Comparison of using pairwise statistical significance to rank database sequences, with well-known database search programs like BLAST, PSI-BLAST, and SSEARCH, is also presented. As expected, the sequence-comparison performance (evaluated in terms of retrieval accuracy) improves significantly as the sequence comparison process is made more and more sequence specific. Shortcomings of currently used approaches and some potentially useful directions for future work are also presented.
UR - http://www.scopus.com/inward/record.url?scp=79958014062&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79958014062&partnerID=8YFLogxK
U2 - 10.1007/978-1-4419-7046-6_30
DO - 10.1007/978-1-4419-7046-6_30
M3 - Conference contribution
C2 - 21431570
AN - SCOPUS:79958014062
SN - 9781441970459
T3 - Advances in Experimental Medicine and Biology
SP - 297
EP - 306
BT - Software Tools and Algorithms for Biological Systems
A2 - Arabnia, Hamid
A2 - Tran, Quoc-Nam
ER -