How to Build a Virtual Cell

From mechanistic models to foundation models

Two decades of trying

E-Cell (1997): simulate a minimal organism — required manually encoding every reaction
Whole-cell model of M. genitalium (Karr et al. 2012): 525 genes, took years, couldn't generalize
Single-cell sequencing changed everything — 100M+ cells now in CellxGene

scGPT trained on 33M cells → cell type annotation accuracy rivals expert curation

💡 Timeline from 1997 to 2026 showing VC 1.0 vs VC 2.0

GEARS: 0.81 R² on seen gene pairs → 0.47 on unseen combinations

💡 Two scatter plots: seen perturbations (tight) vs unseen (diffuse)

Causally-rich perturbation datasets at scale — not just more cells, but more diverse contexts
Multi-omics: RNA + chromatin + protein + spatial in the same model
Validation against real wet lab outcomes, not benchmark proxies

The goal: a 10x improvement in hit rate at drug discovery screening stage

💡 Data diversity diagram: tissues × organisms × perturbation types