Accurate viral population assembly from ultra-deep sequencing data

Serghei Mangul*, Nicholas C. Wu, Nicholas Mancuso, Alex Zelikovsky, Ren Sun, Eleazar Eskin

*Corresponding author for this work

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.

Original languageEnglish (US)
JournalBioinformatics
Volume30
Issue number12
DOIs
StatePublished - Jun 15 2014

Fingerprint

Virus Assembly
High-Throughput Nucleotide Sequencing
Sequencing
Genes
Viral Genome
Population
Genome
Coverage
Viruses
Technology
Expectation-maximization Algorithm
HIV
Fidelity
Virus
Fragment
Eliminate

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Mangul, S., Wu, N. C., Mancuso, N., Zelikovsky, A., Sun, R., & Eskin, E. (2014). Accurate viral population assembly from ultra-deep sequencing data. Bioinformatics, 30(12). https://doi.org/10.1093/bioinformatics/btu295
Mangul, Serghei ; Wu, Nicholas C. ; Mancuso, Nicholas ; Zelikovsky, Alex ; Sun, Ren ; Eskin, Eleazar. / Accurate viral population assembly from ultra-deep sequencing data. In: Bioinformatics. 2014 ; Vol. 30, No. 12.
@article{7f0396c8494e4e11a835e3cf92bdc215,
title = "Accurate viral population assembly from ultra-deep sequencing data",
abstract = "Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.",
author = "Serghei Mangul and Wu, {Nicholas C.} and Nicholas Mancuso and Alex Zelikovsky and Ren Sun and Eleazar Eskin",
year = "2014",
month = "6",
day = "15",
doi = "10.1093/bioinformatics/btu295",
language = "English (US)",
volume = "30",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

Mangul, S, Wu, NC, Mancuso, N, Zelikovsky, A, Sun, R & Eskin, E 2014, 'Accurate viral population assembly from ultra-deep sequencing data', Bioinformatics, vol. 30, no. 12. https://doi.org/10.1093/bioinformatics/btu295

Accurate viral population assembly from ultra-deep sequencing data. / Mangul, Serghei; Wu, Nicholas C.; Mancuso, Nicholas; Zelikovsky, Alex; Sun, Ren; Eskin, Eleazar.

In: Bioinformatics, Vol. 30, No. 12, 15.06.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Accurate viral population assembly from ultra-deep sequencing data

AU - Mangul, Serghei

AU - Wu, Nicholas C.

AU - Mancuso, Nicholas

AU - Zelikovsky, Alex

AU - Sun, Ren

AU - Eskin, Eleazar

PY - 2014/6/15

Y1 - 2014/6/15

N2 - Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.

AB - Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.

UR - http://www.scopus.com/inward/record.url?scp=84902526814&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902526814&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu295

DO - 10.1093/bioinformatics/btu295

M3 - Article

VL - 30

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -