Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability

Gang Wu*, Ming-Yang Kao, Guohui Lin, Jia Huai You

*Corresponding author for this work

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability. Results: The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately 1/1+q2+1/2q4 +1/16q5, where q=p/1-p with running time of O(n5), which is at least 0.984 when p < 0.05. Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the "true " phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.

Original languageEnglish (US)
Article number1
JournalAlgorithms for Molecular Biology
Volume3
Issue number1
DOIs
StatePublished - Jan 24 2008

Fingerprint

Phylogeny
Polynomial time
Polynomials
Topology
Computational Biology
Theoretical Analysis
Efficient Algorithms
Model-based
Lower bound

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics
  • Molecular Biology
  • Structural Biology

Cite this

@article{a7012f9b081a4eaf8dd07921d54d5925,
title = "Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability",
abstract = "Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known {"}true{"} phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the {"}true{"} phylogeny with a high success probability. Results: The first algorithm can reconstruct the {"}true{"} phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the {"}true{"} phylogeny with a probability approximately 1 - p in O(n4log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately 1/1+q2+1/2q4 +1/16q5, where q=p/1-p with running time of O(n5), which is at least 0.984 when p < 0.05. Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the {"}true {"} phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.",
author = "Gang Wu and Ming-Yang Kao and Guohui Lin and You, {Jia Huai}",
year = "2008",
month = "1",
day = "24",
doi = "10.1186/1748-7188-3-1",
language = "English (US)",
volume = "3",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BioMed Central",
number = "1",

}

Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability. / Wu, Gang; Kao, Ming-Yang; Lin, Guohui; You, Jia Huai.

In: Algorithms for Molecular Biology, Vol. 3, No. 1, 1, 24.01.2008.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability

AU - Wu, Gang

AU - Kao, Ming-Yang

AU - Lin, Guohui

AU - You, Jia Huai

PY - 2008/1/24

Y1 - 2008/1/24

N2 - Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability. Results: The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately 1/1+q2+1/2q4 +1/16q5, where q=p/1-p with running time of O(n5), which is at least 0.984 when p < 0.05. Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the "true " phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.

AB - Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability. Results: The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately 1/1+q2+1/2q4 +1/16q5, where q=p/1-p with running time of O(n5), which is at least 0.984 when p < 0.05. Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the "true " phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.

UR - http://www.scopus.com/inward/record.url?scp=40649125833&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40649125833&partnerID=8YFLogxK

U2 - 10.1186/1748-7188-3-1

DO - 10.1186/1748-7188-3-1

M3 - Article

C2 - 18218120

AN - SCOPUS:40649125833

VL - 3

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 1

ER -