Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference
Google / CMU
This Google/CMU paper (arXiv 2605.26099) proposes a sleep-like memory consolidation mechanism for language models. Periodically, the model converts recent context into persistent fast weights in SSM blocks through N offline recurrent passes, then clears its KV cache. On synthetic tasks (cellular automata, multi-hop graph retrieval) and math reasoning benchmarks, increasing sleep duration N improves performance, with the largest gains on examples requiring deeper multi-step reasoning.
Почему это важно
Introduces a principled mechanism for converting short-term context into long-term weights — pointing toward a new paradigm for handling very long contexts without unbounded KV cache growth, a key bottleneck for production inference.
Важность: 2/5
Verified arxiv paper from Google/CMU; novel sleep-inspired memory consolidation with practical implications for long-context inference.