Motivation: High throughput screening by fluorescence activated cell sorting (FACS) is a common task in protein engineering and directed evolution. It can also be a rate-limiting step if high false positive or negative rates necessitate multiple rounds of enrichment. Current FACS software requires the user to define sorting gates by intuition and is practically limited to two dimensions. In cases when multiple rounds of enrichment are required, the software cannot forecast the enrichment effort required. Results: We have developed CellSort, a support vector machine (SVM) algorithm that identifies optimal sorting gates based on machine learning using positive and negative control populations. CellSort can take advantage of more than two dimensions to enhance the ability to distinguish between populations. We also present a Bayesian approach to predict the number of sorting rounds required to enrich a population from a given library size. This Bayesian approach allowed us to determine strategies for biasing the sorting gates in order to reduce the required number of enrichment rounds. This algorithm should be generally useful for improve sorting outcomes and reducing effort when using FACS.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics