TY - GEN
T1 - Maximum likelihood estimation of incomplete genomic spectrum from HTS data
AU - Mangul, Serghei
AU - Astrovskaya, Irina
AU - Nicolae, Marius
AU - Tork, Bassam
AU - Mandoiu, Ion
AU - Zelikovsky, Alex
PY - 2011
Y1 - 2011
N2 - High-throughput sequencing makes possible to process samples containing multiple genomic sequences and then estimate their frequencies or even assemble them. The maximum likelihood estimation of frequencies of the sequences based on observed reads can be efficiently performed using expectation-maximization (EM) method assuming that we know sequences present in the sample. Frequently, such knowledge is incomplete, e.g., in RNA-seq not all isoforms are known and when sequencing viral quasispecies their sequences are unknown. We propose to enhance EM with a virtual string and incorporate it into frequency estimation tools for RNA-Seq and quasispecies sequencing. Our simulations show that EM enhanced with the virtual string estimates string frequencies more accurately than the original methods and that it can find the reads from missing quasispecies thus enabling their reconstruction.
AB - High-throughput sequencing makes possible to process samples containing multiple genomic sequences and then estimate their frequencies or even assemble them. The maximum likelihood estimation of frequencies of the sequences based on observed reads can be efficiently performed using expectation-maximization (EM) method assuming that we know sequences present in the sample. Frequently, such knowledge is incomplete, e.g., in RNA-seq not all isoforms are known and when sequencing viral quasispecies their sequences are unknown. We propose to enhance EM with a virtual string and incorporate it into frequency estimation tools for RNA-Seq and quasispecies sequencing. Our simulations show that EM enhanced with the virtual string estimates string frequencies more accurately than the original methods and that it can find the reads from missing quasispecies thus enabling their reconstruction.
KW - RNA-Sequencing
KW - expectation maximization
KW - high-throughput sequencing
KW - viral quasispecies
UR - http://www.scopus.com/inward/record.url?scp=80052996140&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052996140&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23038-7_19
DO - 10.1007/978-3-642-23038-7_19
M3 - Conference contribution
AN - SCOPUS:80052996140
SN - 9783642230370
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 213
EP - 224
BT - Algorithms in Bioinformatics - 11th International Workshop, WABI 2011, Proceedings
T2 - 11th Workshop on Algorithms in Bioinformatics, WABI 2011
Y2 - 5 September 2011 through 7 September 2011
ER -