Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation

Research official 1 src. ~1 min

Inspired by dual-process cognitive theory, this paper proposes Fast-Slow Training (FST) where model parameters serve as slow weights and optimized context serves as fast weights. FST achieves up to 3x greater sample efficiency over parameter-only fine-tuning on reasoning tasks while maintaining significantly lower divergence from the base model, reducing catastrophic forgetting in sequential task settings.

Why it matters

Catastrophic forgetting and sample inefficiency remain key blockers for deploying LLMs in production settings that evolve over time. The fast/slow weight decomposition offers a practical recipe that doesn't require architectural changes.

Importance: 2/5

Continual learning paper — 3x sample efficiency, practical approach without architectural changes

Sources