# Research Brief: How to build a virtual cell
*Generated: 2026-03-09 19:18*
*Audience: computational biologists and AI researchers | Goal: understand the key technical challenges in building virtual cells*

---

## Key Claims (draft — verify before using)

*(Fill in after reading sources below)*

1. [Claim 1 — with supporting data point]
2. [Claim 2 — with supporting data point]
3. [Claim 3 — with supporting data point]

---

## arXiv Papers

### 1. BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
*Authors: Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06576v1*

The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation and l

### 2. An ode to instantons
*Authors: Oliver Janssen, Joel Karlsson, Flavio Riccardi | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06575v1*

We present a formalism for semiclassical time evolution in quantum mechanics, building on a century of work. We identify complex saddle points in real time, real saddle points in complex time, and complex saddle points in complex time that reproduce the known answers in classic problems. For the decay of a metastable state, we find finite time and finite energy analogs of the "bounce" which do not

### 3. SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation
*Authors: Vishal Thengane, Zhaochong An, Tianjin Huang | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06572v1*

Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remains underexplored for 3D point clouds. Existing methods suffer from catastrophic forgetting or fail to learn discriminative prototypes under sparse supervision, and often overlook a key cue: novel categories frequently appear as unlabelled background in 

### 4. Third-order mixed electroweak-QCD corrections to the W-boson mass prediction from the muon lifetime
*Authors: Ievgen Dubovyk, Ayres Freitas, Janusz Gluza | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06571v1*

We present the calculation of the so far missing ${\cal O}(α^2α_\mathrm{s})$ corrections to the quantity $Δr$, which relates the Fermi constant to the W-boson mass, and enables precision predictions of the latter. While the ${\cal O}(α^2α_\mathrm{s})$ corrections from diagrams with two closed fermion loops are already known, we here focus on the subset with one closed fermion loop, which is a subs

### 5. SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning
*Authors: Alejandra Perez, Anita Rau, Lee White | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06570v1*

Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what comes next. Current surgical AI cannot answer such questions, largely because training data that explicitly encodes surgical reasoning is immensely difficult to annotate at scale. Yet surgical video lectures

### 6. Multimodal Large Language Models as Image Classifiers
*Authors: Nikita Kisel, Illia Volkov, Klara Janouskova | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06578v1*

Multimodal Large Language Models (MLLM) classification performance depends critically on evaluation protocol and ground truth quality. Studies comparing MLLMs with supervised and vision-language models report conflicting conclusions, and we show these conflicts stem from protocols that either inflate or underestimate performance. Across the most common evaluation protocols, we identify and fix key

### 7. Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
*Authors: Lijiang Li, Zuwei Long, Yunhang Shen | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06577v1*

While recent multimodal large language models (MLLMs) have made impressive strides, they predominantly employ a conventional autoregressive architecture as their backbone, leaving significant room to explore effective and efficient alternatives in architectural design. Concurrently, recent studies have successfully applied discrete diffusion models to various domains, such as visual understanding 

### 8. Fly360: Omnidirectional Obstacle Avoidance within Drone View
*Authors: Xiangkai Zhang, Dizhe Zhang, WenZhuo Cao | 2026-03-06*
*URL: http://arxiv.org/abs/2603.06573v1*

Obstacle avoidance in unmanned aerial vehicles (UAVs), as a fundamental capability, has gained increasing attention with the growing focus on spatial intelligence. However, current obstacle-avoidance methods mainly depend on limited field-of-view sensors and are ill-suited for UAV scenarios which require full-spatial awareness when the movement direction differs from the UAV's heading. This limita

---

## Gaps & Conflicts

*(Agent: flag here where data is thin, disputed, or missing)*

- [ ] [Gap 1]
- [ ] [Conflict 1]

---

## Bo's Relevant Work

- **scGPT** (Cui et al., Nature Methods 2024): Trained on 33M cells. Zero-shot cell annotation, perturbation prediction, multi-omic integration. Best results: [fill in from paper]
- **LUMI-6** (Wang & Li, Cell 2026): 20.3% lung epithelial gene editing in vivo using brominated lipids
- **Xaira Therapeutics**: Building causally-rich perturbation datasets for virtual cell models