TY - JOUR
T1 - Improved alignment of nucleosome DNA sequences using a mixture model.
AU - Wang, Ji Ping Z
AU - Widom, Jonathan
N1 - Funding Information:
This work is supported by the Joint NSF/NIGMS Initiative to Support Research in the Area of Mathematical Biology grant # 1 R01 GM075313 awarded to J.-P. Wang and J. Widom. The authors thank Andrew Travers for providing the chicken nucleosome core DNA sequence data, for help with the center 60 alignment and Fourier analysis, and for comments on the manuscript; the author thank Kelly Thayer for helpful suggestions and comments on the manuscript, Eran Segal for discussion and providing the randomly chosen chicken genomic sequences, and Bruce Spencer and Wenxin Jiang for helpful discussions. 65 Funding to pay the Open Access publication charges for this article was provided by National Institute of General Medical Sciences.
PY - 2005
Y1 - 2005
N2 - DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of 'hot spots' that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.
AB - DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of 'hot spots' that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.
UR - http://www.scopus.com/inward/record.url?scp=33644643053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33644643053&partnerID=8YFLogxK
U2 - 10.1093/nar/gki977
DO - 10.1093/nar/gki977
M3 - Article
C2 - 16339114
AN - SCOPUS:33644643053
SN - 0305-1048
VL - 33
SP - 6743
EP - 6755
JO - Nucleic acids research
JF - Nucleic acids research
IS - 21
ER -