This chapter addresses the problem of finding weak similarities or distant relationships among proteins for which only the sequences are known. Multiple fixed-length sequence comparison (MSC) is conceptually no different from pair wise fixed-length comparison, except that the scoring function is generalized and there are many more ways to form combinations of runs. The highest score arising from the inter-comparison of sequences is statistically significant only if it is higher than what chance alone would be expected to produce. The FORTRAN program that implements the sequence inter-comparison algorithm requires, in addition to the sequences, a few parameters that control its operation. The run length, k, should be set to around 20 to give the most sensitive sequence comparison. A suitable computer configuration for running MSC is a mainframe, minicomputer, or workstation. A small personal computer can even be used to good effect, because the memory requirements of the program are quite modest. Most of the memory is used by the heaps, and at most two of these are actually retained by the program at any given time.
ASJC Scopus subject areas
- Molecular Biology