HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

Zhaohui S. Qin*, Jianjun Yu, Jincheng Shen, Christopher A. Maher, Ming Hu, Shanker Kalyana-Sundaram, Jindan Yu, Arul M. Chinnaiyan

*Corresponding author for this work

Research output: Contribution to journalArticle

84 Scopus citations

Abstract

Background: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.Results: Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.Conclusions: Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.

Original languageEnglish (US)
Article number369
JournalBMC bioinformatics
Volume11
DOIs
StatePublished - Jul 2 2010

    Fingerprint

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Qin, Z. S., Yu, J., Shen, J., Maher, C. A., Hu, M., Kalyana-Sundaram, S., Yu, J., & Chinnaiyan, A. M. (2010). HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-Seq data. BMC bioinformatics, 11, [369]. https://doi.org/10.1186/1471-2105-11-369