Abstract
Sequencing-based microbial count data analysis is a challenging task due to the presence of numerous non-biological zeros, which can impede downstream analysis. To tackle this issue, we introduce two novel approaches, PhyImpute and UniFracImpute, which leverage similar microbial samples to identify and impute non-biological zeros in microbial count data. Our proposed methods utilize the probability of non-biological zeros and phylogenetic trees to estimate sample-to-sample similarity, thus addressing this challenge. To evaluate the performance of our proposed methods, we conduct experiments using both simulated and real microbial data. The results demonstrate that PhyImpute and UniFracImpute outperform existing methods in recovering the zeros and empowering downstream analyses such as differential abundance analysis, and disease status classification.
Original language | English (US) |
---|---|
Article number | bbae653 |
Journal | Briefings in Bioinformatics |
Volume | 26 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2025 |
Funding
This work has been partially supported by the United States Department of Agriculture (ARZT-1361620-H22-149 to L.A.) and the National Institute of Health (R01AI149754 and R01ES027013 to Y.C.).
Keywords
- imputation
- metagenomics
- microbiome
- phylogenetic tree
ASJC Scopus subject areas
- Information Systems
- Molecular Biology