# Building Foundation Models That Reason
### BioReason: From Genomic Sequences to Biological Insight
Bo Wang · Xaira Therapeutics · University of Toronto · Vector Institute
NIH, 2026

---

## Biology's Grand Challenge: Beyond Pattern Matching
- Foundation models have transformed NLP, vision, and protein structure prediction
- But biological reasoning requires more than feature extraction
- Expert biologists integrate sequence, structure, pathway context, and evolutionary history
- Current models excel at representation — they struggle at explanation

**The gap: prediction accuracy vs. mechanistic understanding**

---

## What DNA Foundation Models Get Right — and Wrong
- Strong at local sequence representation (Nucleotide Transformer, HyenaDNA, DNABERT-2)
- Reasonable at variant effect scoring within training distribution
- Fail on multi-step causal reasoning: sequence → regulation → pathway → disease
- Outputs are black-box scores with no traceable biological logic

**Variant effect prediction R² < 0.5 on novel regulatory contexts**

---

# What if a model could not just predict — but explain its reasoning, step by step?

---

## BioReason: Tightly Coupling DNA Models with Language Reasoning
- Integrate a DNA foundation model directly with a large language model (LLM)
- LLM interprets genomic embeddings as structured biological context
- Train with supervised fine-tuning on chain-of-thought biological deductions
- Apply reinforcement learning to reward logically coherent, biologically accurate reasoning

**NeurIPS 2025 | github.com/bowang-lab/BioReason**

---

## BioReason Architecture: DNA Embeddings as Reasoning Tokens
- DNA encoder produces dense sequence representations from any genomic locus
- A learned projection bridges the genomic embedding space and LLM token space
- LLM attends jointly over sequence context and natural language reasoning history
- Single end-to-end model — no retrieval, no external lookup

---

## Training BioReason: SFT + RL for Biological Coherence
- Supervised fine-tuning on curated chain-of-thought biological reasoning examples
- Reinforcement learning reward signal: factual accuracy + pathway coherence + parsimony
- Model learns to produce verifiable deduction steps, not just associations
- Generalizes to biological entities never seen during training

---

## BioReason Results: Disease Pathway Prediction
- Task: KEGG-based disease pathway prediction from genomic sequence alone
- Baseline (DNA foundation model): 86% accuracy
- BioReason: 98% accuracy — 12-point absolute gain
- Model provides step-by-step pathway attribution for each prediction

**86% → 98% on KEGG disease pathway prediction**

---

## BioReason Results: Variant Effect Prediction
- Benchmarked against strong baselines across multiple variant effect tasks
- Average improvement: +15% over best prior models
- Critically: BioReason explains *why* a variant is functional
- Reasoning traces mention regulatory elements, TF binding, splicing effects — without being told to

**+15% average improvement on variant effect prediction benchmarks**

---

# BioReason-1 showed reasoning works for DNA.
BioReason-Pro extends this to the protein universe.

---

## The Protein Annotation Problem
- ~250 million proteins in UniProt; <1% have experimental functional annotations
- Current methods: sequence similarity (misses functional convergence) or isolated classifiers
- Gene Ontology (GO) has 45,000+ terms with complex hierarchical dependencies
- No existing method integrates sequence, structure, domains, and interactions together

**Annotation gap: ~249M proteins remain poorly characterized**

---

## GO-GPT: Modeling the Ontology as a Language
- Autoregressive transformer trained to predict GO terms in dependency order
- Captures hierarchical (parent-child) and cross-aspect (MF/BP/CC) dependencies
- Generates GO annotations as structured sequences, not independent binary predictions
- Serves as the precise ontology backbone for BioReason-Pro

---

## BioReason-Pro: Multimodal Reasoning for Protein Function
- Input: protein sequence embeddings + GO-GPT predictions + structural features
- Generates structured reasoning traces that mirror expert biological inference
- Trained on 130,000+ synthetic reasoning traces generated by GPT-5
- Further optimized with reinforcement learning for functional coherence

**bioRxiv 2026 | bioreason.net**

---

## BioReason-Pro: Quantitative Results
- GO term prediction: 73.6% Fmax — state-of-the-art across all GO aspects
- LLM-judge evaluation of functional summaries: 8.0 / 10
- Outperforms ESM-2, ProtTrans, and all prior GO prediction methods
- Scales gracefully to novel protein families with no training examples

**73.6% Fmax on GO term prediction**

---

## Human Expert Validation: Preferred Over Ground Truth
- Blind evaluation: protein function experts compared BioReason-Pro vs UniProt annotations
- BioReason-Pro preferred in 79% of head-to-head comparisons
- Experts cite: richer mechanistic context, integration of interaction evidence, clarity
- First model where AI annotations are preferred over curated gold-standard database

**79% of human experts prefer BioReason-Pro over UniProt ground truth**

---

## De Novo Binding Partner Prediction with Structural Validation
- BioReason-Pro predicted binding partners for proteins with no prior interaction data
- Predictions experimentally confirmed by independent cryo-EM studies
- Per-residue attention weights localized to exact contact residues in cryo-EM structures
- Demonstrates genuine mechanistic understanding — not statistical correlation

**Attention weights map precisely to cryo-EM contact residues**

---

## A Unified Picture: The BioReason Framework
- Modality-agnostic: applies to DNA (BioReason-1) and protein (BioReason-Pro)
- Core recipe: domain-specific encoder + LLM + SFT on reasoning traces + RL
- Outputs are interpretable, traceable, and can be audited by biologists
- Framework generalizes: RNA, epigenomics, metabolomics are natural next steps

---

## What Reasoning Unlocks That Prediction Cannot
- Scientific trust: reviewers and regulators can inspect the model's logic
- Error diagnosis: failures are traceable to specific reasoning steps
- Knowledge integration: model can incorporate new facts at inference time
- Discovery: reasoning traces reveal hypotheses not in training data

---

## Future Directions
- BioReason for single-cell: reasoning over cell state transitions and perturbations
- Multi-agent BioReason: specialized sub-agents debating biological hypotheses
- Closed-loop integration with lab automation for hypothesis-experiment feedback
- Clinical translation: variant interpretation for rare disease diagnosis

---

## Summary
- BioReason-1 (NeurIPS 2025): DNA-LLM integration with RL achieves 98% disease pathway accuracy and +15% variant effect prediction
- BioReason-Pro (2026): multimodal protein reasoning achieves 73.6% Fmax, preferred by experts 79% of the time
- Core insight: reasoning transforms foundation models from predictors into scientific partners
- Code, data, checkpoints: github.com/bowang-lab/BioReason · bioreason.net

**Building foundation models that reason — not just predict**

---
