Design optimization methods for genomic DNA tiling arrays

Paul Bertone, Valery Trifonov, Joel S. Rozowsky, Falk Schubert, Olof Emanuelsson, John Karro, Ming-Yang Kao, Michael Snyder, Mark Gerstein*

*Corresponding author for this work

Research output: Contribution to journalArticle

40 Citations (Scopus)

Abstract

A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

Original languageEnglish (US)
Pages (from-to)271-281
Number of pages11
JournalGenome Research
Volume16
Issue number2
DOIs
StatePublished - Feb 1 2006

Fingerprint

Oligonucleotide Array Sequence Analysis
Genome
DNA Fragmentation
Oligonucleotides
DNA
Research

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., ... Gerstein, M. (2006). Design optimization methods for genomic DNA tiling arrays. Genome Research, 16(2), 271-281. https://doi.org/10.1101/gr.4452906
Bertone, Paul ; Trifonov, Valery ; Rozowsky, Joel S. ; Schubert, Falk ; Emanuelsson, Olof ; Karro, John ; Kao, Ming-Yang ; Snyder, Michael ; Gerstein, Mark. / Design optimization methods for genomic DNA tiling arrays. In: Genome Research. 2006 ; Vol. 16, No. 2. pp. 271-281.
@article{87d8469d73a143cf86f08aa7e65363ab,
title = "Design optimization methods for genomic DNA tiling arrays",
abstract = "A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.",
author = "Paul Bertone and Valery Trifonov and Rozowsky, {Joel S.} and Falk Schubert and Olof Emanuelsson and John Karro and Ming-Yang Kao and Michael Snyder and Mark Gerstein",
year = "2006",
month = "2",
day = "1",
doi = "10.1101/gr.4452906",
language = "English (US)",
volume = "16",
pages = "271--281",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "2",

}

Bertone, P, Trifonov, V, Rozowsky, JS, Schubert, F, Emanuelsson, O, Karro, J, Kao, M-Y, Snyder, M & Gerstein, M 2006, 'Design optimization methods for genomic DNA tiling arrays', Genome Research, vol. 16, no. 2, pp. 271-281. https://doi.org/10.1101/gr.4452906

Design optimization methods for genomic DNA tiling arrays. / Bertone, Paul; Trifonov, Valery; Rozowsky, Joel S.; Schubert, Falk; Emanuelsson, Olof; Karro, John; Kao, Ming-Yang; Snyder, Michael; Gerstein, Mark.

In: Genome Research, Vol. 16, No. 2, 01.02.2006, p. 271-281.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Design optimization methods for genomic DNA tiling arrays

AU - Bertone, Paul

AU - Trifonov, Valery

AU - Rozowsky, Joel S.

AU - Schubert, Falk

AU - Emanuelsson, Olof

AU - Karro, John

AU - Kao, Ming-Yang

AU - Snyder, Michael

AU - Gerstein, Mark

PY - 2006/2/1

Y1 - 2006/2/1

N2 - A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

AB - A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

UR - http://www.scopus.com/inward/record.url?scp=32044460122&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32044460122&partnerID=8YFLogxK

U2 - 10.1101/gr.4452906

DO - 10.1101/gr.4452906

M3 - Article

C2 - 16365382

AN - SCOPUS:32044460122

VL - 16

SP - 271

EP - 281

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 2

ER -

Bertone P, Trifonov V, Rozowsky JS, Schubert F, Emanuelsson O, Karro J et al. Design optimization methods for genomic DNA tiling arrays. Genome Research. 2006 Feb 1;16(2):271-281. https://doi.org/10.1101/gr.4452906