Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms

Ziyang Zhang, Madeline E. Melzer*, Keerthana M. Arun, Hanxiao Sun, Carl Johan Eriksson, Itai Fabian, Sagi Shaashua, Karun Kiani, Yaara Oren, Yogesh Goyal*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Single-cell RNA sequencing (scRNA-seq) datasets contain true single cells, or singlets, in addition to cells that coalesce during the protocol, or doublets. Identifying singlets with high fidelity in scRNA-seq is necessary to avoid false negative and false positive discoveries. Although several methodologies have been proposed, they are typically tested on highly heterogeneous datasets and lack a priori knowledge of true singlets. Here, we leveraged datasets with synthetically introduced DNA barcodes for a hitherto unexplored application: to extract ground-truth singlets. We demonstrated the feasibility of our framework, “singletCode,” to evaluate existing doublet detection methods across a range of contexts. We also leveraged our ground-truth singlets to train a proof-of-concept machine learning classifier, which outperformed other doublet detection algorithms. Our integrative framework can identify ground-truth singlets and enable robust doublet detection in non-barcoded datasets.

Original languageEnglish (US)
Article number100592
JournalCell Genomics
Volume4
Issue number7
DOIs
StatePublished - Jul 10 2024

Funding

We thank members of the Goyal lab (especially Ian Mellis, Nitu Kumari, Emanuelle Grody, and Aurelia Leona) and Jeff Mold for helpful discussions and comments on the manuscript. We thank Dane Vassiliadis and Mark Dawson (SPLINTR), Amy Brock (ClonMapper), Kunal Jindal and Samantha Morris (CellTag-multi), Caleb Weinreb and Allon Klein (LARRY), and Michael Ratz (TREX) for barcoded datasets. Y.G. acknowledges support from Northwestern University's startup and Burroughs Wellcome Fund Career Awards at the Scientific Interface. Z.Z. and M.E.M. were supported by funds to Y.G. K.K. acknowledges support from the University of Pennsylvania MSTP. M.E.M. and Y.G. acknowledge support from the National Institute for Theory and Mathematics in Biology through the National Science Foundation (DMS-2235451) and the Simons Foundation (MPTMPS-00005320). Y.O. is supported by the Azrieli Faculty Fellowship. Y.G. is a CZ Biohub Investigator. Y.G. conceived and designed the study. Z.Z. and M.E.M. designed and performed a majority of the analyses. K.M.A. K.K. and C.-J.E. performed a subset of the analysis. K.M.A. assisted with the singletCode package and website. M.E.M. and Y.G. prepared a majority of the figures and tables, with input from H.S. I.F. S.S. and Y.O. performed the experiments. Y.G. Z.Z. and M.E.M. wrote the manuscript, with input from all authors. Y.G. received consultancy fees from the Schmidt Science Fellows. The entire first draft was written without the use of any generative AI. To improve the wording of a sentence, the authors used ChatGPT, but such instances were rare. After using ChatGPT, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication. We thank members of the Goyal lab (especially Ian Mellis, Nitu Kumari, Emanuelle Grody, and Aurelia Leona) and Jeff Mold for helpful discussions and comments on the manuscript. We thank Dane Vassiliadis and Mark Dawson (SPLINTR), Amy Brock (ClonMapper), Kunal Jindal and Samantha Morris (CellTag-multi), Caleb Weinreb and Allon Klein (LARRY), and Michael Ratz (TREX) for barcoded datasets. Y.G. acknowledges support from Northwestern University \u2019s startup and Burroughs Wellcome Fund Career Awards at the Scientific Interface . Z.Z. and M.E.M. were supported by funds to Y.G. K.K. acknowledges support from the University of Pennsylvania MSTP . M.E.M. and Y.G. acknowledge support from the National Institute for Theory and Mathematics in Biology through the National Science Foundation ( DMS-2235451 ) and the Simons Foundation ( MPTMPS-00005320 ). Y.O. is supported by the Azrieli Faculty Fellowship . Y.G. is a CZ Biohub Investigator.

Keywords

  • barcoding
  • benchmarking
  • doublet detection
  • lineage tracing
  • machine learning
  • scRNA-seq
  • single-cell genomics
  • singletCode
  • singlets

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Genetics

Fingerprint

Dive into the research topics of 'Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms'. Together they form a unique fingerprint.

Cite this