Accurate protein stability prediction for small domains using mega-scale experiments
Cho, Y.; Tsuboyama, K.; Litberg, T. J.; Jung, M. D.; Obisesan, A.; Wang, Q.; Phoumyvong, C. M.; Thibeault, J.; Ovchinnikov, S.; Rocklin et al.
Loading
Fetching the latest research
Cho, Y.; Tsuboyama, K.; Litberg, T. J.; Jung, M. D.; Obisesan, A.; Wang, Q.; Phoumyvong, C. M.; Thibeault, J.; Ovchinnikov, S.; Rocklin et al.
Predicting absolute protein folding stability is a long-standing challenge in biophysics, with broad applications in protein design and in understanding genetic variation and evolution. Physics-based simulations have shown limited success at predicting stability and are often computationally intractable, and machine learning methods have been constrained by the lack of sufficiently large experimental datasets. We recently introduced cDNA display proteolysis, a cell-free approach that can measure folding stability for nearly one million protein domains in parallel. Here, we applied this method to measure stability for 1.8 million diverse protein domains 60-80 amino acids in length primarily taken from the MGnify metagenomic database and spanning over 200,000 sequence families. Using this new "MGnify Stability dataset", we developed the predictive models SaProt{Delta}G and ESM3{Delta}G, which accurately predict absolute folding stability for small domains with root mean squared error of 0.8 kcal/mol over a 6 kcal/mol range (Spearman rank correlation of 0.88). These predictors show high accuracy at predicting effects of substitutions, insertions, and deletions, successfully identify global trends toward higher stability in thermophilic organisms, and improve discrimination of stable and unstable computationally designed proteins. Our results illustrate how megascale biophysical measurements can complement existing evolutionary and structural data to enable accurate absolute stability prediction for small domains.
Scientists measured a whopping 1.8 million protein stabilities in a massive data bonanza, then built a predictor so sharp it turns tiny domain design into a reliable game—think of it as a quirky molecular fortune-teller that actually gets the folding future right most of the time.
Shared by Gabriel Rocklin (@grocklin) and Állan Ferrari (@ajrferrari) highlighting the dataset quality and modeling power, sparking praise from protein designers and biophysicists
View discussion on XPeer review in progress...
Loading...
CD4⁺ T cells confer transplantable rejuvenation via Rivers of telomeres
Lanna, A.; Valvo, S.; Dustin, M.; Rinaldi, F.
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis
Smith, A. A.; Wong, E. L.; Donovan, R. C.; Chapman, B. A.; Harry, R.; Tirandazi, P.; Kanigowska, P.; Gendreau, E. A.; Dahl, R. H.; Jastrzebski, M.; Cortez, J. E.; Bremner, C. J.; Hemuda, J. C. M.; Dooner, J.; Graves, I.; Karandikar, R.; Lionetti, C.; Christopher, K.; Consiglio, A. L.; Tran, A.; McCusker, W.; Nguyen, D. X.; Nunes da Silva, I. B.; Bautista-Ayala, A. R.; McNerney, M. P.; Atkins, S.; McDuffie, M.; Serber, W.; Barber, B. P.; Thanongsinh, T.; Nesson, A.; Lama, B.; Nichols, B.; LaFrance, C.; Nyima, T.; Byrn, A.; Thornhill, R.; Cai, B.; Ayala-Valdez, L.; Wong, A.; Che, A. J.; Thavaraj
A Single-Cell and Spatial 3D Multi-omic Atlas of Developing Human Basal Ganglia and Inhibitory Neurons
Heffel, M. G.; Xu, H.; Pastor-Alonso, O.; Li, X.; Baig, M. S.; Irfan Ghoor, R.; Li, R.; Kern, C.; Kum, J.; Zhang, Y.; Paino, J.; Tsai, M. J.; Tai, C.-Y.; Tucker, G.; Zhao, Z.; Hou, A.; von Behren, Z.; Bhade, M.; Li, S.; Sandoval, K.; Scholes, J.; Codrea, F.; Calimlim, J.; Liao, E. K.; Leung, G.; Kim, J.; Eskin, E.; Flint, J.; Cotter, J. A.; Pasaniuc, B.; Bintu, B.; Zhu, Q.; Mukamel, E. A.; Ernst, J.; Paredes, M. F.; Luo, C.
Prediction of transformative breakthroughs in biomedical research
Davis, M. T.; Busse, B. L.; Arabi, S.; Meyer, P.; Hoppe, T. A.; Meseroll, R. A.; Hutchins, B. I.; Willis, K. A.; Santangelo, G. M.