Global analysis of protein folding using massively parallel design, synthesis, and testing

Gabriel Rocklin, Tamuka M. Chidyausiku, Inna Goreshnik, Alex Ford, Scott Houliston, Alexander Lemak, Lauren Carter, Rashmi Ravichandran, Vikram K. Mulligan, Aaron Chevalier, Cheryl H. Arrowsmith, David Baker*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

321 Scopus citations

Abstract

Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds-a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment and has the potential to transform computational protein design into a data-driven science.

Original languageEnglish (US)
Pages (from-to)168-175
Number of pages8
JournalScience
Volume357
Issue number6347
DOIs
StatePublished - Jul 14 2017

Funding

Supported by the Howard Hughes Medical Institute (D.B.) and the Natural Sciences and Engineering Research Council of Canada (C.H.A.). G.J.R. is a Merck Fellow of the Life Sciences Research Foundation. C.H.A. holds a Canada Research Chair in Structural Genomics. We thank S. Rettie for mass spectrometry support; C. Lee for deep sequencing support, S. Ovchinnikov for assistance quantifying sequence conservation; V. Nguyen, A. Yehdego, T. Howard, and K. Lau for assistance with protein purification; and H. Gelman and many other members of the Baker lab for helpful discussions. This work was facilitated by the Hyak supercomputer at the University of Washington and by donations of computing time from Rosetta@Home participants. The Structural Genomics Consortium is a registered charity (number 1097737) that receives funds from AbbVie; Bayer Pharma AG; Boehringer Ingelheim; Canada Foundation for Innovation; Eshelman Institute for Innovation; Genome Canada through Ontario Genomics Institute grant OGI-055; Innovative Medicines Initiative (EU/EFPIA) through ULTRA-DD grant 115766; Janssen Pharmaceuticals; Merck & Co.; Novartis Pharma AG; Ontario Ministry of Research, Innovation and Science (MRIS); Pfizer; São Paulo Research Foundation-FAPESP; Takeda; and the Wellcome Trust. The RosettaScripts code and blueprint files used for protein design are provided in the supplementary materials. The data for this work (designed sequences and structures, deep sequencing counts, EC50 values, stability scores, and structural analysis of the designed models) are also provided in supplementary materials. The python code for inferring EC50 values and for fitting the unfolded state model is provided at https://github.com/asford/protease-experimental-analysis. G.J.R. and D.B. are inventors on provisional patent application no. 62/491,518 filed 28 April 2017 by the University of Washington that covers (i) the method described in this work for computationally designing and experimentally verifying stable miniproteins, and (ii) the 4000 most stable protein sequences designed in the work. Author contributions: G.J.R. designed the research, the experimental approach, and the proteins; G.J.R., T.M.C., I.G., S.H., L.C., R.R., and A.C. performed experiments; all authors analyzed data; G.J.R., A.F., and V.K.M. contributed new computational tools; C.H.A. and D.B. supervised research; and G.J.R. and D.B. wrote the manuscript.

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Global analysis of protein folding using massively parallel design, synthesis, and testing'. Together they form a unique fingerprint.

Cite this