Multiple sequence alignment

David J. Bacon*, Wayne F. Anderson

*Corresponding author for this work

Research output: Contribution to journalArticle

89 Scopus citations

Abstract

A method has been developed for aligning segments of several sequences at once. The number of search steps depends only polynomially on the number of sequences, instead of exponentially, because most alignments are rejected without being evaluated explicitly. A data structure herein called the "heap" facilitates this process. For a set of n sequence segments, the overall similarity is taken to be the sum of all the constituent segment pair similarities, which are in turn sums of corresponding residue similarity scores from a Table. The statistical models that test alignments for significance make it possible to group sequences objectively, even when most or all of the interrelationships are weak. These tests are very sensitive, while remaining quite conservative, and discourage the addition of "misfit" sequences to an existing set. The new techniques are applied to a set of five DNA-binding proteins, to a group of three enzymes that employ the coenzyme FAD, and to a control set. The alignment previously proposed for the DNA-binding proteins on the basis of structural comparisons and inspection of sequences is supported quite dramatically, and a highly significant alignment is found for the FAD-binding proteins.

Original languageEnglish (US)
Pages (from-to)153-161
Number of pages9
JournalJournal of Molecular Biology
Volume191
Issue number2
DOIs
StatePublished - Sep 20 1986

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology

Fingerprint Dive into the research topics of 'Multiple sequence alignment'. Together they form a unique fingerprint.

Cite this