Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation
Inspired by dual-process cognitive theory, this paper proposes Fast-Slow Training (FST) where model parameters serve as slow weights and optimized context serves as fast weights. FST achieves up to 3x greater sample efficiency over parameter-only fine-tuning on reasoning tasks while maintaining significantly lower divergence from the base model, reducing catastrophic forgetting in sequential task settings.
Why it matters
Catastrophic forgetting and sample inefficiency remain key blockers for deploying LLMs in production settings that evolve over time. The fast/slow weight decomposition offers a practical recipe that doesn't require architectural changes.
Importance: 2/5
Continual learning paper — 3x sample efficiency, practical approach without architectural changes