TY - GEN
T1 - An Integer Programming approach to novel transcript reconstruction from paired-end RNA-Seq reads
AU - Mangul, Serghei
AU - Caciula, Adrian
AU - Al Seesi, Sahar
AU - Brinza, Dumitru
AU - Banday, Abdul Rouf
AU - Kanadia, Rahul
PY - 2012
Y1 - 2012
N2 - Massively parallel whole transcriptome sequencing, commonly referred to as RNA-Seq, has become the technology of choice for performing gene expression profiling. However, reconstruction of full-length novel transcripts from RNA-Seq data remains challenging due to the short read length delivered by most existing sequencing technologies. We propose a novel statistical genome-guided method called "Transcriptome Reconstruction using Integer Programming" (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. TRIP creates a splice graph based on aligned RNA-Seq reads and enumerates all maximal paths corresponding to putative transcripts. The problem of selecting true transcripts is formulated as an integer program (IP) which minimizes the set of selected transcripts yielding a good statistical fit between the fragment length distribution (empirically determined during library preparation) and fragment lengths implied by mapped read pairs. Experimental results on both real and synthetic datasets show that TRIP is more accurate than methods ignoring fragment length distribution information.
AB - Massively parallel whole transcriptome sequencing, commonly referred to as RNA-Seq, has become the technology of choice for performing gene expression profiling. However, reconstruction of full-length novel transcripts from RNA-Seq data remains challenging due to the short read length delivered by most existing sequencing technologies. We propose a novel statistical genome-guided method called "Transcriptome Reconstruction using Integer Programming" (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. TRIP creates a splice graph based on aligned RNA-Seq reads and enumerates all maximal paths corresponding to putative transcripts. The problem of selecting true transcripts is formulated as an integer program (IP) which minimizes the set of selected transcripts yielding a good statistical fit between the fragment length distribution (empirically determined during library preparation) and fragment lengths implied by mapped read pairs. Experimental results on both real and synthetic datasets show that TRIP is more accurate than methods ignoring fragment length distribution information.
KW - Algorithms
UR - http://www.scopus.com/inward/record.url?scp=84869420829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869420829&partnerID=8YFLogxK
U2 - 10.1145/2382936.2382983
DO - 10.1145/2382936.2382983
M3 - Conference contribution
AN - SCOPUS:84869420829
SN - 9781450316705
T3 - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
SP - 369
EP - 376
BT - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
T2 - 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012
Y2 - 7 October 2012 through 10 October 2012
ER -