Loading
Fetching the latest research
Fang, T.; Wang, X.; Xiao, Z.; Hang, S.; Murtaza, G.; Yang, J.; Xu, H.; Jha, A.; Noble, W. S.; Wang et al.
Understanding how genomic sequences shape three-dimensional (3D) genome architecture is funda-mental to interpreting diverse biological processes. Although previous studies have shown that sequence information can predict 3D genome architecture, they fall short in capturing cell type-specific structures because they are trained solely on sequence inputs. The widely available Hi-C data, which contain rich structural information across biosamples, can provide complementary features to sequence data for study-ing cell type-specific architectures. Recently, DNA foundation models have demonstrated encouraging performance in capturing long-range genomic dependencies, holding promise for modeling chromatin interactions. However, the extremely high computational cost of running these models limits their applicability to Hi-C analysis, which requires genome-wide sequence embeddings. Here, we present Evo2HiC, a multimodal foundation model that jointly models genomic sequences and structures to study cell type-specific chromatin structure. The key idea of Evo2HiC is to distill a large-scale DNA foundation model, Evo 2 (7B), into a compact encoder, while guiding the distillation with Hi-C data to preserve genomic features critical for 3D genome analysis. The model supports two types of encoders, one that operates directly on DNA sequences, and a second that additionally takes as input corresponding Hi-C data. Using the DNA-only encoder and predicting Hi-C contact matrices, Evo2HiC improved Spearman correlation by 10.9% over Orca. Moreover, by jointly embedding Hi-C and sequence information Evo2HiC achieved the best overall Pearson correlation when predicting five representative epigenomic assays. Interpretation analysis of Evo2HiC revealed its ability to identify cell type-specific sequence motifs that explain changes in epigenomic signals. Finally, we demonstrated the cross-species generalizability of Evo2HiC on 177 species from the DNA Zoo dataset for Hi-C resolution enhancement. In summary, Evo2HiC is a foundation model that integrates genome sequences and 3D chromatin structure information, substantially reduces computational cost while maintaining state-of-the-art accuracy on predicting various epigenomic signals and genome architecture, enables the identification of cell type-specific motifs, and demonstrates robust generalizability across species.
Peer review in progress...
Loading...
CD4⁺ T cells confer transplantable rejuvenation via Rivers of telomeres
Lanna, A.; Valvo, S.; Dustin, M.; Rinaldi, F.
Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis
Smith, A. A.; Wong, E. L.; Donovan, R. C.; Chapman, B. A.; Harry, R.; Tirandazi, P.; Kanigowska, P.; Gendreau, E. A.; Dahl, R. H.; Jastrzebski, M.; Cortez, J. E.; Bremner, C. J.; Hemuda, J. C. M.; Dooner, J.; Graves, I.; Karandikar, R.; Lionetti, C.; Christopher, K.; Consiglio, A. L.; Tran, A.; McCusker, W.; Nguyen, D. X.; Nunes da Silva, I. B.; Bautista-Ayala, A. R.; McNerney, M. P.; Atkins, S.; McDuffie, M.; Serber, W.; Barber, B. P.; Thanongsinh, T.; Nesson, A.; Lama, B.; Nichols, B.; LaFrance, C.; Nyima, T.; Byrn, A.; Thornhill, R.; Cai, B.; Ayala-Valdez, L.; Wong, A.; Che, A. J.; Thavaraj
A Single-Cell and Spatial 3D Multi-omic Atlas of Developing Human Basal Ganglia and Inhibitory Neurons
Heffel, M. G.; Xu, H.; Pastor-Alonso, O.; Li, X.; Baig, M. S.; Irfan Ghoor, R.; Li, R.; Kern, C.; Kum, J.; Zhang, Y.; Paino, J.; Tsai, M. J.; Tai, C.-Y.; Tucker, G.; Zhao, Z.; Hou, A.; von Behren, Z.; Bhade, M.; Li, S.; Sandoval, K.; Scholes, J.; Codrea, F.; Calimlim, J.; Liao, E. K.; Leung, G.; Kim, J.; Eskin, E.; Flint, J.; Cotter, J. A.; Pasaniuc, B.; Bintu, B.; Zhu, Q.; Mukamel, E. A.; Ernst, J.; Paredes, M. F.; Luo, C.
Prediction of transformative breakthroughs in biomedical research
Davis, M. T.; Busse, B. L.; Arabi, S.; Meyer, P.; Hoppe, T. A.; Meseroll, R. A.; Hutchins, B. I.; Willis, K. A.; Santangelo, G. M.