On the detection and refinement of transcription factor binding sites using ChIP-Seq data

Ming Hu, Jindan Yu, Jeremy M.G. Taylor, Arul M. Chinnaiyan, Zhaohui S. Qin*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

82 Scopus citations

Abstract

Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better detect and refine transcription factor binding sites (TFBS). We introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for TFBS motif discovery in ChIP-Seq data. We propose a Bayesian model that incorporates sequencing depth information to aid motif identification. Our model also allows intra-motif dependency to describe more accurately the underlying motif pattern. Our algorithm combines stochastic sampling and deterministic 'greedy' search steps into a novel hybrid iterative scheme. This combination accelerates the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that (i) the accuracy of existing TFBS motif patterns can be significantly improved; and (ii) there is significant intra-motif dependency inside all the TFBS motifs we tested; modeling these dependencies further improves the accuracy of these TFBS motif patterns.

Original languageEnglish (US)
Article numbergkp1180
Pages (from-to)2154-2167
Number of pages14
JournalNucleic acids research
Volume38
Issue number7
DOIs
StatePublished - Jan 7 2010

ASJC Scopus subject areas

  • Genetics

Fingerprint

Dive into the research topics of 'On the detection and refinement of transcription factor binding sites using ChIP-Seq data'. Together they form a unique fingerprint.

Cite this