TY - JOUR
T1 - A computational framework to explore large-scale biosynthetic diversity
AU - Navarro-Muñoz, Jorge C.
AU - Selem-Mojica, Nelly
AU - Mullowney, Michael W.
AU - Kautsar, Satria A.
AU - Tryon, James H.
AU - Parkinson, Elizabeth I.
AU - De Los Santos, Emmanuel L.C.
AU - Yeong, Marley
AU - Cruz-Morales, Pablo
AU - Abubucker, Sahar
AU - Roeters, Arne
AU - Lokhorst, Wouter
AU - Fernandez-Guerra, Antonio
AU - Cappelini, Luciana Teresa Dias
AU - Goering, Anthony W.
AU - Thomson, Regan J.
AU - Metcalf, William W.
AU - Kelleher, Neil L.
AU - Barona-Gomez, Francisco
AU - Medema, Marnix H.
N1 - Funding Information:
We thank the following: the ARS of the USDA for providing bacterial strains; H. Sook Ann, Z. Crispino, Y. Kim, N. Ciszek and K. Espejo for generating bacterial culture extracts; R. McClure, M. Robey and G. Miley for assistance with and contributions to metabolomic data collection methods and acquisition; and Dr. Y. Zhang and Dr. Y. Wu of the Integrated Molecular Structure Education and Research Center (IMSERC) at Northwestern University for assistance in acquiring NMR data. Some analyses were carried out using CONABIO’s computing cluster, with funds from the Secretariat of Environment and Natural Resources. We thank K. Blin for technical assistance with setting up the website on the secondarymetabolites.org domain. The research reported in this publication was supported by the Netherlands Organization for Scientific Research (grant no. 863.15.002 to M.H.M.), the Graduate School for Experimental Plant Sciences (grant to M.H.M.); National Institutes of Health (NIH) Genome to Natural Products Network supplementary award (no. U01GM110706 to M.H.M.), CONACyT grants (grant nos. CBS2017_285746 and 2017_051TAMU to F.B.-G.; postdoctoral scholarship 263661 to J.C.N.M.; PhD scholarship 204482 to N.S.M. (who was also supported by the Innovation Secretary of Guanajuato)), the National Cancer Institute of the NIH (award no. F32CA221327 to M.W.M.), the National Institute of General Medical Sciences (award no. F32GM120999 to E.I.P.), the São Paulo Research Foundation (FAPESP, grant no. 17/08038-8 to L.T.D.C.), the National Center for Complementary and Integrative Health of the NIH (award no. R01AT009143 to R.J.T. and N.L.K.) and Warwick Integrative Synthetic Biology Centre, a UK Synthetic Biology Research grant from the Biotechnology and Biological Sciences Research Council and Engineering and Physical Sciences Research Council (grant no. BB/M017982/1 to E.L.C.D.L.S). This work made use of the IMSERC at Northwestern University, which has received support from the NIH (grant nos. 1S10OD012016-01/1S10RR019071-01A1), the State of Illinois and the International Institute for Nanotechnology. A.F.-G. received funding from the European Union’s Horizon 2020 research and innovation program (Blue Growth: Unlocking the Potential of Seas and Oceans; grant agreement no. 634486).
Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the ‘biosynthetic gene similarity clustering and prospecting engine’ (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the ‘core analysis of syntenic orthologues to prioritize natural product gene clusters’ (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.
AB - Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the ‘biosynthetic gene similarity clustering and prospecting engine’ (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the ‘core analysis of syntenic orthologues to prioritize natural product gene clusters’ (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.
UR - http://www.scopus.com/inward/record.url?scp=85075417325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075417325&partnerID=8YFLogxK
U2 - 10.1038/s41589-019-0400-9
DO - 10.1038/s41589-019-0400-9
M3 - Article
C2 - 31768033
AN - SCOPUS:85075417325
SN - 1552-4450
VL - 16
SP - 60
EP - 68
JO - Nature Chemical Biology
JF - Nature Chemical Biology
IS - 1
ER -