TY - JOUR
T1 - HPeak
T2 - An HMM-based algorithm for defining read-enriched regions in ChIP-Seq data
AU - Qin, Zhaohui S.
AU - Yu, Jianjun
AU - Shen, Jincheng
AU - Maher, Christopher A.
AU - Hu, Ming
AU - Kalyana-Sundaram, Shanker
AU - Yu, Jindan
AU - Chinnaiyan, Arul M.
N1 - Funding Information:
We thank Dr. Terrence Barrette, Xuhong Cao and members of the Arul Chinnai-yan Laboratory for valuable suggestions and comments on earlier versions of the program and manuscript. We thank Dr. Tanya Teslovich and Ms. Jill Granger for critical reading of the manuscript. We thank Dr. Ghia Euskirchen of the Michael Synder Laboratory for providing us the list of identified peaks from the STAT1 ChIP-chip experiment. We are grateful to Dr. Olivier Elemento at Weill Cornell Medical College for his help with describing and using the ChIPseeqer software. We are grateful to Dr. Hongkai Ji at Johns Hopkins University for his help with using the CisGenome software. CAM was supported by an NIH Ruth L. Kirschstein post-doctoral training grant and currently derives support from the American Association of Cancer Research Amgen Fellowship in Clinical/ Translational Research and the Canary Foundation and American Cancer Society Early Detection Postdoctoral Fellowship. JY was supported in part by Department of Defense New Investigator Award PC080665 and National Institutes of Health K99/R00 grant K99CA129565. AMC is supported by the Department of Defense Era of Hope grant BC075023 and the National Functional Genomics Center W81XWH-09-2-0014; National Cancer Institute SPORE in Prostate Cancer P50 CA69568, the Burroughs Wellcome Fund, and the Prostate Cancer Foundation. AMC is also an American Cancer Society Research Professor. JC, MH and ZSQ were supported in part by NIH grant R01HG005119. ZSQ was also support in part by NIH Comprehensive Cancer Center grant CA 46592 and Prostate SPORE grant CA69568.
PY - 2010/7/2
Y1 - 2010/7/2
N2 - Background: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.Results: Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.Conclusions: Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.
AB - Background: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.Results: Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.Conclusions: Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.
UR - http://www.scopus.com/inward/record.url?scp=77954041808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954041808&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-11-369
DO - 10.1186/1471-2105-11-369
M3 - Article
C2 - 20598134
AN - SCOPUS:77954041808
SN - 1471-2105
VL - 11
JO - BMC bioinformatics
JF - BMC bioinformatics
M1 - 369
ER -