TY - JOUR
T1 - Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples
AU - Nadel, Brian B.
AU - Oliva, Meritxell
AU - Shou, Benjamin L.
AU - Mitchell, Keith
AU - Ma, Feiyang
AU - Montoya, Dennis J.
AU - Mouton, Alice
AU - Kim-Hellmuth, Sarah
AU - Stranger, Barbara E.
AU - Pellegrini, Matteo
AU - Mangul, Serghei
N1 - Publisher Copyright:
© 2021 The Author(s). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
PY - 2021/11/1
Y1 - 2021/11/1
N2 - Estimating cell type composition of blood and tissue samples is a biological challenge relevant in both laboratory studies and clinical care. In recent years, a number of computational tools have been developed to estimate cell type abundance using gene expression data. Although these tools use a variety of approaches, they all leverage expression profiles from purified cell types to evaluate the cell type composition within samples. In this study, we compare 12 cell type quantification tools and evaluate their performance while using each of 10 separate reference profiles. Specifically, we have run each tool on over 4000 samples with known cell type proportions, spanning both immune and stromal cell types. A total of 12 of these represent in vitro synthetic mixtures and 300 represent in silico synthetic mixtures prepared using single-cell data. A final 3728 clinical samples have been collected from the Framingham cohort, for which cell populations have been quantified using electrical impedance cell counting. When tools are applied to the Framingham dataset, the tool Estimating the Proportions of Immune and Cancer cells (EPIC) produces the highest correlation, whereas Gene Expression Deconvolution Interactive Tool (GEDIT) produces the lowest error. The best tool for other datasets is varied, but CIBERSORT and GEDIT most consistently produce accurate results. We find that optimal reference depends on the tool used, and report suggested references to be used with each tool. Most tools return results within minutes, but on large datasets runtimes for CIBERSORT can exceed hours or even days. We conclude that deconvolution methods are capable of returning high-quality results, but that proper reference selection is critical.
AB - Estimating cell type composition of blood and tissue samples is a biological challenge relevant in both laboratory studies and clinical care. In recent years, a number of computational tools have been developed to estimate cell type abundance using gene expression data. Although these tools use a variety of approaches, they all leverage expression profiles from purified cell types to evaluate the cell type composition within samples. In this study, we compare 12 cell type quantification tools and evaluate their performance while using each of 10 separate reference profiles. Specifically, we have run each tool on over 4000 samples with known cell type proportions, spanning both immune and stromal cell types. A total of 12 of these represent in vitro synthetic mixtures and 300 represent in silico synthetic mixtures prepared using single-cell data. A final 3728 clinical samples have been collected from the Framingham cohort, for which cell populations have been quantified using electrical impedance cell counting. When tools are applied to the Framingham dataset, the tool Estimating the Proportions of Immune and Cancer cells (EPIC) produces the highest correlation, whereas Gene Expression Deconvolution Interactive Tool (GEDIT) produces the lowest error. The best tool for other datasets is varied, but CIBERSORT and GEDIT most consistently produce accurate results. We find that optimal reference depends on the tool used, and report suggested references to be used with each tool. Most tools return results within minutes, but on large datasets runtimes for CIBERSORT can exceed hours or even days. We conclude that deconvolution methods are capable of returning high-quality results, but that proper reference selection is critical.
KW - benchmarking
KW - cell type deconvolution
KW - cell type quantification
KW - gene expression
UR - http://www.scopus.com/inward/record.url?scp=85121950125&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121950125&partnerID=8YFLogxK
U2 - 10.1093/bib/bbab265
DO - 10.1093/bib/bbab265
M3 - Article
C2 - 34346485
AN - SCOPUS:85121950125
SN - 1467-5463
VL - 22
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 6
M1 - bbab265
ER -