Abstract
Retrobiosynthesis tools harness the inherent promiscuities of enzymes for the de novo design of novel biosynthetic pathways to key small molecules. Many existing pathway search algorithms rely on exhaustively enumerating the space of all possible enzymatic reactions using generalized rules, followed by an extensive analysis of the ensuing reaction network to extract candidate pathways for experimental validation. While this approach is comprehensive, many false positive reactions are often generated given the permissiveness of such reaction rules. Here, we have developed DORA-XGB, a enzymatic reaction feasibility classifier. DORA-XGB can be used within our DORAnet framework to assess whether newly enumerated enzymatic reactions and pathways would be feasible. To curate a training dataset for our model, we extracted enzymatic reactions from public databases and screened them for their general thermodynamic feasibility. We then considered alternate reaction centers on known substrates to strategically generate infeasible reactions with high confidence, thereby circumventing the lack of negative data in the literature. In training our model, we also experimented with various molecular fingerprinting techniques and configurations for assembling reaction fingerprints, taking into account not just primary substrate and primary product structures, but cofactor structures as well. Our model's utility is demonstrated through favorable benchmarking against a previously published classifier, the successful recovery of newly published reactions, and the ranking of previously predicted pathways for the biosynthesis of propionic acid from pyruvate.
Original language | English (US) |
---|---|
Pages (from-to) | 129-142 |
Number of pages | 14 |
Journal | Molecular Systems Design and Engineering |
Volume | 10 |
Issue number | 2 |
DOIs | |
State | Published - Nov 2 2024 |
Funding
The authors would like to thank Dr. Christopher Henry, Dr. Danielle Tullman-Ercek, Dr. Jacob Martin, Dr. Tracey Dinh, Dr. Bapi Mandal, Dr. Sai Praneet Batchu, Shivani Kozarekar, Stefan Pate, Geoffrey Bonnanzio, Rawia Marafi, and Margaret Guilarte-Silva for their invaluable insights and constructive discussions. The funding for Yash Chainani for this study was partly provided for by the Northwestern University Graduate School Cluster Fellowship in Biotechnology, Systems, and Synthetic Biology, which is affiliated with the Biotechnology Training Program, and partly by the DOE Joint BioEnergy Institute ( https://www.jbei.org ) supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under Award Number DE-AC02-05CH11231 with Lawrence Berkeley National Laboratory. The funding for Zhuofu Ni for this study was partly provided for by the U.S. Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research under Award Number DE-SC0018249, and partly by an Institute of Sustainability and Energy at Northwestern (ISEN) Fellowship. This research project was supported in part through the computational resources and staff contributions provided by the Quest high performance computing facility at Northwestern University, which is jointly supported by the Office of the Provost, the Office of Research, and Northwestern University Information Technology. This research also used resources of the National Energy Research Scientific Computing Center (NERSC), a Department of Energy Office of Science User Facility using NERSC award ERCAP0028489.
ASJC Scopus subject areas
- Chemistry (miscellaneous)
- Chemical Engineering (miscellaneous)
- Biomedical Engineering
- Energy Engineering and Power Technology
- Process Chemistry and Technology
- Industrial and Manufacturing Engineering
- Materials Chemistry