# Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics

**Authors:** Ihor Kendiukhov
**Year:** 2026
**Venue:** arXiv preprint
**arXiv:** 2602.15253

## One-sentence summary

Single-cell masked-reconstruction transformers follow power-law scaling analogous to NLP — but only when data is sufficient; in data-limited regimes, model size is not the binding constraint.

## Key contribution

First systematic empirical study of scaling laws for single-cell foundation models. Establishes that the NLP scaling law framework applies to scRNA-seq, with important caveats about the data regime.

## Methods

- Two regimes: data-rich (512 HVGs, 200k cells) and data-limited (1024 genes, 10k cells)
- Seven model sizes: 533 to 340M parameters (3 orders of magnitude)
- Fitted parametric scaling law to validation MSE
- Data from CellxGene Census

## Key findings

- **Data-rich regime:** Clear power-law scaling; irreducible loss floor c ≈ 1.44; ~2.30 bits of entropy per masked gene
- **Data-limited regime:** Negligible scaling — adding more parameters doesn't help when data is scarce
- Data-to-parameter ratio is the critical determinant of scaling behavior
- Scaling laws analogous to NLP do emerge in scRNA-seq

## Implications

- More data is more important than more parameters, up to the scaling equilibrium point
- Current large models (scGPT at 33M cells, TranscriptFormer at ~100M) may be near or past their data-limited ceiling on CellxGene-type observational data
- To unlock the next tier of scaling gains, need new data (diverse perturbation atlases, multi-modal, cross-species)

## Limitations

- Single pretraining objective (masked reconstruction) — JEPA or diffusion may have different scaling curves
- Does not measure downstream task performance, only pretraining loss
- Single author — needs replication

## Connections

- [../concepts/single-cell-foundation-models.md](../concepts/single-cell-foundation-models.md) — quantifies the scaling behavior
- [../concepts/perturbation-biology.md](../concepts/perturbation-biology.md) — data diversity is the next bottleneck
- [../papers/parameter-free-representations.md](parameter-free-representations.md) — complementary challenge: even when scaling works, is it measuring the right thing?