# Replogle 2022: Mapping Information-Rich Genotype-Phenotype Landscapes with Genome-Scale Perturb-seq

**Authors:** Joseph M. Replogle, Reuben A. Saunders, Angela N. Pogson, Jeffrey A. Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J. Norman, et al., Jonathan S. Weissman
**Year:** 2022
**Venue:** Cell, vol. 189 (2022)
**DOI:** 10.1016/j.cell.2022.05.013
**bioRxiv:** 2021.12.16.473013

## One-sentence summary

Genome-scale CRISPRi Perturb-seq targeting all ~10,000 expressed essential genes across 2.5 million human cells — the largest single-cell perturbation dataset and a foundational reference for Virtual Cell development.

## Key contribution

First **genome-scale** single-cell perturbation atlas. Previous Perturb-seq studies covered hundreds of genes; this covers ~10k essential genes in two human cell lines (K562 and RPE1), generating 2.5 million single-cell profiles. Creates a systematic map of gene function from transcriptional phenotypes.

## Experimental design

- **Cell lines:** K562 (myeloid leukemia) and RPE1 (retinal pigment epithelium)
- **Perturbation type:** CRISPRi (gene knockdown via dCas9-KRAB)
- **Scale:** ~10,000 expressed essential genes × 2 cell lines
- **Readout:** 10x scRNA-seq, ~2.5 million cells total
- **Design:** Single-gene knockdowns only (no combinatorial); 2 guide RNAs per gene

## Key findings

- Predicts function of **hundreds of previously uncharacterized genes** from transcriptional phenotypes
- Identifies new regulators of ribosome biogenesis, transcription, and mitochondrial respiration
- Reveals complex cellular phenomena: RNA processing, stress responses, differentiation trajectories
- Systematically identifies genetic drivers of aneuploidy
- Single-cell resolution reveals cell-state-specific responses (different subpopulations respond differently)

## Significance for the field

This dataset is the **Perturb-seq atlas** that the field has been waiting for. It provides:
1. Ground truth for ~10k single-gene knockdowns
2. Two cell lines enabling some cross-context comparison
3. Scale sufficient to train/evaluate perturbation prediction models

## Current gap it highlights

Even this dataset only covers:
- 2 cell lines (both immortalized, neither primary)
- Single-gene knockouts only (no combinations, no drugs)
- CRISPRi (knockdown) only — no CRISPRa, no small molecules
- ~10k of ~20k human genes (essential genes only)

Still far from the causally rich, contextually diverse atlas needed for a generalizable Virtual Cell.

## Connections

- [../concepts/perturbation-biology.md](../concepts/perturbation-biology.md) — largest single resource in this space
- [../concepts/virtual-cell.md](../concepts/virtual-cell.md) — the data that makes Virtual Cell training possible
- [../papers/gears.md](gears.md) — uses Replogle as training data
- [../papers/norman-2019.md](norman-2019.md) — predecessor; combinatorial but smaller scale
- [../entities/weissman-lab.md](../entities/weissman-lab.md) — same lab
- [../entities/xaira-therapeutics.md](../entities/xaira-therapeutics.md) — Xaira aims to build a Replogle++ across many contexts

## Bo's notes

Replogle 2022 is currently the most commonly used "big" perturbation dataset. But from Xaira's perspective, it's still deeply limited: 2 cell lines, no combinations, no drugs, no disease context. The argument at Xaira is exactly this: Replogle-scale data, but diverse in context (primary cells, disease states, combination perturbations). That's what the Virtual Cell actually needs.