### Abstract

We study a natural probabilistic model for motif discovery that has been used to experimentally test the effectiveness of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet ∑ A motif G = g _{1}g_{2} ⋯ gm is a string of m characters. Each background sequence is implanted a probabilistically generated approximate copy of G. For a probabilistically generated approximate copy b_{1}b _{2} ⋯ b_{m} of G, every character is probabilistically generated such that the probability for b_{i} ≠ g_{i} is at most a. It has been conjectured that multiple background sequences can help with finding faint motifs G. In this paper, we develop an efficient algorithm that can discover a hidden motif from a set of sequences for any alphabet ∑ with |∑| ≥ 2 and is applicable to DNA motif discovery. We prove that for a ≤ 1/4 (1 - 1|∑|) and any constant x ≥ 8, there exist positive constants c_{0}, ε, δ_{1} and δ_{2} such that if the length ρ of motif G is at least δ_{1} log n, and there are k ≥ c_{0} log n input sequences, then in O(n ^{2}+kn) time this algorithm finds the motif with probability at least 1- 1/2_{x} for every G ε ∑^{ρ}-ψ _{ρ,h,ε}(∑), where ρ is the length of the motif, h is a parameter with ρ ≥ 4h ≥ δ_{2} log n, and ψ_{ρ,h,ε}(∑) is a small subset of at most 2- ^{θ(ε2h}) fraction of the sequences in ∑^{ρ}. The constants c_{0}, ε, δ_{1} and d_{2} do not depend on x when x is a parameter of order O(log n). Our algorithm can take any number k sequences as input.

Original language | English (US) |
---|---|

Title of host publication | Theory and Applications of Models of Computation - 6th Annual Conference, TAMC 2009, Proceedings |

Pages | 231-240 |

Number of pages | 10 |

DOIs | |

State | Published - Jul 16 2009 |

Event | 6th Annual Conference on Theory and Applications of Models of Computation, TAMC 2009 - Changsha, China Duration: May 18 2009 → May 22 2009 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 5532 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Other

Other | 6th Annual Conference on Theory and Applications of Models of Computation, TAMC 2009 |
---|---|

Country | China |

City | Changsha |

Period | 5/18/09 → 5/22/09 |

### Fingerprint

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computer Science(all)

### Cite this

*Theory and Applications of Models of Computation - 6th Annual Conference, TAMC 2009, Proceedings*(pp. 231-240). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5532 LNCS). https://doi.org/10.1007/978-3-642-02017-9_26