Motivation: The urgent need for efficient and sustainable biological production of fuels and high-value chemicals has elicited a wave of in silico techniques for identifying promising novel pathways to these compounds in large putative metabolic networks. To date, these approaches have primarily used general graph search algorithms, which are prohibitively slow as putative metabolic networks may exceed 1 million compounds. To alleviate this limitation, we report two methods - SimIndex (SI) and SimZyme - which use chemical similarity of 2D chemical fingerprints to efficiently navigate large metabolic networks and propose enzymatic connections between the constituent nodes. We also report a Byers-Waterman type pathway search algorithm for further paring down pertinent networks. Results: Benchmarking tests run with SI show it can reduce the number of nodes visited in searching a putative network by 100-fold with a computational time improvement of up to 105-fold. Subsequent Byers-Waterman search application further reduces the number of nodes searched by up to 100-fold, while SimZyme demonstrates ∼90% accuracy in matching query substrates with enzymes. Using these modules, we have designed and annotated an alternative to the methylerythritol phosphate pathway to produce isopentenyl pyrophosphate with more favorable thermodynamics than the native pathway. These algorithms will have a significant impact on our ability to use large metabolic networks that lack annotation of promiscuous reactions.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics