High-throughput discovery of protein energy landscapes in natural and designed proteomes

Project: Research project

Project Details


All folded proteins continuously fluctuate between their low-energy native structures and higher-energy “hidden” conformations that can be partially or fully unfolded. Although each protein molecule passes through its highenergy conformations only a small fraction of the time, these states have major physiological consequences. Partially-folded states of natural proteins can lead to protein aggregation, organ failure, and death. Partiallyfolded states of therapeutic proteins can induce dangerous anti-drug antibodies. The energetic balance between the native, folded state and these diverse higher-energy states – in other words, the overall “energy landscape” – is thus critically important in protein aggregation and immunogenicity, as well as in allostery, signaling, offtarget drug interactions and numerous other phenomena. Despite decades of research into energy landscapes our overall understanding is very limited: few proteins have been characterized in depth, accurate computational predictions are very challenging, and experimental measurements are expensive, slow, and labor-intensive. We propose a transformational approach to understand protein energy landscapes by integrating a new massively parallel experimental method, machine learning, and protein design. First, we are developing a new high-throughput assay using hydrogen exchange mass spectrometry to measure energy landscapes for thousands of proteins in parallel. This method finally brings the study of protein energy landscapes into the “omics” age. Critically, these experiments reveal both the overall folding stability and the energies of conformational fluctuations in each protein. For a subset of proteins (tens to hundreds), these parallel experiments reveal the specific sites of conformational fluctuations as well. Using this approach, we will measure the energy landscapes of thousands of natural proteins and tens of thousands of computationally designed proteins custom-built to systematically probe how specific properties affect energy landscapes. We will then train machine learning models to predict energy landscapes from sequence and structure, as well as optimize physical force fields to accurately model high-energy protein states. We will also catalyze advances in modeling throughout the community by organizing large-scale competitions at blind prediction of energy landscapes. Finally, with these new predictive models in hand, we will pursue a unique application: the development of energetically-optimized screening libraries for therapeutic protein and biological probe discovery. This overcomes a major challenge in drug and probe development. In sum, this study provides the experimental and computational tools to bring hidden protein states to light quantitatively on a massive scale. This fundamentally shifts our perspective: instead of examining energy landscapes only when they cause problems, we can make energy landscape analysis a central tool in biology and bioengineering
Effective start/end date9/30/205/31/25


  • National Institute of General Medical Sciences (1DP2GM140927-01)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.