Abstract
Single-cell RNA sequencing (scRNA-seq) datasets contain true single cells, or singlets, in addition to cells that coalesce during the protocol, or doublets. Identifying singlets with high fidelity in scRNA-seq is necessary to avoid false negative and false positive discoveries. Although several methodologies have been proposed, they are typically tested on highly heterogeneous datasets and lack a priori knowledge of true singlets. Here, we leveraged datasets with synthetically introduced DNA barcodes for a hitherto unexplored application: to extract ground-truth singlets. We demonstrated the feasibility of our framework, “singletCode,” to evaluate existing doublet detection methods across a range of contexts. We also leveraged our ground-truth singlets to train a proof-of-concept machine learning classifier, which outperformed other doublet detection algorithms. Our integrative framework can identify ground-truth singlets and enable robust doublet detection in non-barcoded datasets.
Original language | English (US) |
---|---|
Article number | 100592 |
Journal | Cell Genomics |
Volume | 4 |
Issue number | 7 |
DOIs | |
State | Published - Jul 10 2024 |
Funding
We thank members of the Goyal lab (especially Ian Mellis, Nitu Kumari, Emanuelle Grody, and Aurelia Leona) and Jeff Mold for helpful discussions and comments on the manuscript. We thank Dane Vassiliadis and Mark Dawson (SPLINTR), Amy Brock (ClonMapper), Kunal Jindal and Samantha Morris (CellTag-multi), Caleb Weinreb and Allon Klein (LARRY), and Michael Ratz (TREX) for barcoded datasets. Y.G. acknowledges support from Northwestern University's startup and Burroughs Wellcome Fund Career Awards at the Scientific Interface. Z.Z. and M.E.M. were supported by funds to Y.G. K.K. acknowledges support from the University of Pennsylvania MSTP. M.E.M. and Y.G. acknowledge support from the National Institute for Theory and Mathematics in Biology through the National Science Foundation (DMS-2235451) and the Simons Foundation (MPTMPS-00005320). Y.O. is supported by the Azrieli Faculty Fellowship. Y.G. is a CZ Biohub Investigator. Y.G. conceived and designed the study. Z.Z. and M.E.M. designed and performed a majority of the analyses. K.M.A. K.K. and C.-J.E. performed a subset of the analysis. K.M.A. assisted with the singletCode package and website. M.E.M. and Y.G. prepared a majority of the figures and tables, with input from H.S. I.F. S.S. and Y.O. performed the experiments. Y.G. Z.Z. and M.E.M. wrote the manuscript, with input from all authors. Y.G. received consultancy fees from the Schmidt Science Fellows. The entire first draft was written without the use of any generative AI. To improve the wording of a sentence, the authors used ChatGPT, but such instances were rare. After using ChatGPT, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication. We thank members of the Goyal lab (especially Ian Mellis, Nitu Kumari, Emanuelle Grody, and Aurelia Leona) and Jeff Mold for helpful discussions and comments on the manuscript. We thank Dane Vassiliadis and Mark Dawson (SPLINTR), Amy Brock (ClonMapper), Kunal Jindal and Samantha Morris (CellTag-multi), Caleb Weinreb and Allon Klein (LARRY), and Michael Ratz (TREX) for barcoded datasets. Y.G. acknowledges support from Northwestern University \u2019s startup and Burroughs Wellcome Fund Career Awards at the Scientific Interface . Z.Z. and M.E.M. were supported by funds to Y.G. K.K. acknowledges support from the University of Pennsylvania MSTP . M.E.M. and Y.G. acknowledge support from the National Institute for Theory and Mathematics in Biology through the National Science Foundation ( DMS-2235451 ) and the Simons Foundation ( MPTMPS-00005320 ). Y.O. is supported by the Azrieli Faculty Fellowship . Y.G. is a CZ Biohub Investigator.
Keywords
- barcoding
- benchmarking
- doublet detection
- lineage tracing
- machine learning
- scRNA-seq
- single-cell genomics
- singletCode
- singlets
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology (miscellaneous)
- Genetics