Abstract
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale 1. However, the energetics driving folding are invisible in these structures and remain largely unknown 2. The hidden thermodynamics of folding can drive disease 3,4, shape protein evolution 5–7 and guide protein engineering 8–10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.
Original language | English (US) |
---|---|
Pages (from-to) | 434-444 |
Number of pages | 11 |
Journal | Nature |
Volume | 620 |
Issue number | 7973 |
DOIs | |
State | Published - Aug 10 2023 |
Funding
The authors thank Epsilon Molecular Engineering (EME) for providing cnvK linker for cDNA display, Rush University and Genome Research Core at University of Illinois Chicago for performing next-generation sequencing, and D. Minh, T. Whitehead, K. Lindorff-Larsen, D. M. McCandlish, J. Maguire, J. Chodera, P. Hosseinzadeh, and the members of the Rocklin laboratory for discussions and comments on the manuscript. This work was supported by Northwestern University Startup Funding (G.J.R.), JSPS KAKENHI 19J30003 (K.T.), Human Frontier Science Program Long-Term Fellowship (K.T.) and JST PRESTO Grant JPMJPR21E9 (K.T.). This research was supported in part through the computational resources and staff contributions provided for the Quest high performance computing facility at Northwestern University which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology. The authors thank Epsilon Molecular Engineering (EME) for providing cnvK linker for cDNA display, Rush University and Genome Research Core at University of Illinois Chicago for performing next-generation sequencing, and D. Minh, T. Whitehead, K. Lindorff-Larsen, D. M. McCandlish, J. Maguire, J. Chodera, P. Hosseinzadeh, and the members of the Rocklin laboratory for discussions and comments on the manuscript. This work was supported by Northwestern University Startup Funding (G.J.R.), JSPS KAKENHI 19J30003 (K.T.), Human Frontier Science Program Long-Term Fellowship (K.T.) and JST PRESTO Grant JPMJPR21E9 (K.T.). This research was supported in part through the computational resources and staff contributions provided for the Quest high performance computing facility at Northwestern University which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology.
ASJC Scopus subject areas
- General