BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models

Oscar A. Zarate, Yiben Yang, Xiaozhong Wang, Ji Ping Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Background: In the CRISPR-Cas9 system, the efficiency of genetic modifications has been found to vary depending on the single guide RNA (sgRNA) used. A variety of sgRNA properties have been found to be predictive of CRISPR cleavage efficiency, including the position-specific sequence composition of sgRNAs, global sgRNA sequence properties, and thermodynamic features. While prevalent existing deep learning-based approaches provide competitive prediction accuracy, a more interpretable model is desirable to help understand how different features may contribute to CRISPR-Cas9 cleavage efficiency. Results: We propose a gradient boosting approach, utilizing LightGBM to develop an integrated tool, BoostMEC (Boosting Model for Efficient CRISPR), for the prediction of wild-type CRISPR-Cas9 editing efficiency. We benchmark BoostMEC against 10 popular models on 13 external datasets and show its competitive performance. Conclusions: BoostMEC can provide state-of-the-art predictions of CRISPR-Cas9 cleavage efficiency for sgRNA design and selection. Relying on direct and derived sequence features of sgRNA sequences and based on conventional machine learning, BoostMEC maintains an advantage over other state-of-the-art CRISPR efficiency prediction models that are based on deep learning through its ability to produce more interpretable feature insights and predictions.

Original languageEnglish (US)
Article number446
JournalBMC bioinformatics
Issue number1
StatePublished - Dec 2022


  • CRISPR-Cas9
  • Feature engineering
  • Interpretability
  • LightGBM
  • Machine learning
  • Regression trees
  • sgRNA

ASJC Scopus subject areas

  • Applied Mathematics
  • Molecular Biology
  • Structural Biology
  • Biochemistry
  • Computer Science Applications


Dive into the research topics of 'BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models'. Together they form a unique fingerprint.

Cite this