Αlpha¹ Record ID: 5462737A
3 views
Share this Αlpha¹ record with your network
Manuscript Under Review

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Authors

Wang, C.; Karimzadeh, M.; Ravindra, N. G.; Bounds, L. R.; Alerasool, N.; Huang, A. C.; Ma, S.; Gulbranson, D. R.; Cui, H.; Lee et al.

Abstract

Causal models of cellular systems hold the promise to empower broad biological discovery, including the systematic identification of novel targets for drug discovery. Predicting how genetic and pathway perturbations reshape gene expression across diverse cellular contexts is a prerequisite for building generalizable cellular foundation models. However, current methods typically fail to extrapolate beyond their training distributions because they rely predominantly on observational expression atlases rather than interventional perturbation data. We present X-Atlas/Pisces, the largest genome-wide CRISPRi Perturb-seq compendium to date, comprising 25.6 million perturbed single-cell transcriptomes across 16 biologically diverse contexts, including widely used cell lines, induced pluripotent stem cells (iPSCs), resting and CD3/CD28 activated Jurkat T lymphoma cells, and multi-lineage differentiating iPSCs. Leveraging this resource, we develop X-Cell, a diffusion language model that predicts perturbation responses by iteratively refining control-to-perturbed state transitions through cross-attention to multi-modal biological priors derived from natural language, protein language models, interaction networks, genetic dependency maps, and morphological profiles. X-Cell outperforms existing state-of-the-art models by up to five-fold on key metrics such as Pearson{Delta} (correlation between predicted and observed perturbation-induced log-fold changes), and demonstrates zero-shot prediction of T cell inactivating perturbations in stimulated Jurkat cells. We scale X-Cell to 4.9 billion parameters (X-Cell-Ultra), the largest causal perturbation model to date. We demonstrate for the first time that perturbation prediction follows power-law scaling with an exponent matching large language models. X-Cell-Ultra demonstrates zero-shot generalization to novel biological contexts, including unseen iPSC-derived melanocyte progenitors and primary human CD4+ T cells from multiple donors, and outperforms all baselines after self-supervised test-time adaptation. These results demonstrate that coordinated scaling of causal perturbation data and model capacity yields foundation models capable of generalizable perturbation prediction across cellular contexts, with potential applications for improving computational target identification, validation, and context-specific therapeutic prioritization.

Αlpha¹ Highlights
Cheeky Summary

Xaira’s massive 4.9B-parameter “virtual cell” acts like a diffusion-language fortune teller, zero-shot predicting gene expression chaos from CRISPR hits across unseen cell types using a whopping 25M+ perturbed transcriptomes in pure LLM power-law style.

Community Buzz

Announced by Bo Wang (@BoWang87) as part of Xaira’s virtual cell efforts, building on a highly engaged earlier thread; it pulled likes and comments from AI-bio folks hyping the causal scaling and upcoming data/model release

View discussion on X

Peer Reviews

Peer review in progress...

Your Assessment

Rate This Paper

Quick Takes

0 takes

Loading...

More to Read

View All →

CD4⁺ T cells confer transplantable rejuvenation via Rivers of telomeres

Lanna, A.; Valvo, S.; Dustin, M.; Rinaldi, F.

Under Review391

Using a GPT-5-driven autonomous lab to optimize the cost and titer of cell-free protein synthesis

Smith, A. A.; Wong, E. L.; Donovan, R. C.; Chapman, B. A.; Harry, R.; Tirandazi, P.; Kanigowska, P.; Gendreau, E. A.; Dahl, R. H.; Jastrzebski, M.; Cortez, J. E.; Bremner, C. J.; Hemuda, J. C. M.; Dooner, J.; Graves, I.; Karandikar, R.; Lionetti, C.; Christopher, K.; Consiglio, A. L.; Tran, A.; McCusker, W.; Nguyen, D. X.; Nunes da Silva, I. B.; Bautista-Ayala, A. R.; McNerney, M. P.; Atkins, S.; McDuffie, M.; Serber, W.; Barber, B. P.; Thanongsinh, T.; Nesson, A.; Lama, B.; Nichols, B.; LaFrance, C.; Nyima, T.; Byrn, A.; Thornhill, R.; Cai, B.; Ayala-Valdez, L.; Wong, A.; Che, A. J.; Thavaraj

Reviewed193

A Single-Cell and Spatial 3D Multi-omic Atlas of Developing Human Basal Ganglia and Inhibitory Neurons

Heffel, M. G.; Xu, H.; Pastor-Alonso, O.; Li, X.; Baig, M. S.; Irfan Ghoor, R.; Li, R.; Kern, C.; Kum, J.; Zhang, Y.; Paino, J.; Tsai, M. J.; Tai, C.-Y.; Tucker, G.; Zhao, Z.; Hou, A.; von Behren, Z.; Bhade, M.; Li, S.; Sandoval, K.; Scholes, J.; Codrea, F.; Calimlim, J.; Liao, E. K.; Leung, G.; Kim, J.; Eskin, E.; Flint, J.; Cotter, J. A.; Pasaniuc, B.; Bintu, B.; Zhu, Q.; Mukamel, E. A.; Ernst, J.; Paredes, M. F.; Luo, C.

Under Review62

Prediction of transformative breakthroughs in biomedical research

Davis, M. T.; Busse, B. L.; Arabi, S.; Meyer, P.; Hoppe, T. A.; Meseroll, R. A.; Hutchins, B. I.; Willis, K. A.; Santangelo, G. M.

Under Review26