Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference

Google / CMU

исследования офиц. + СМИ 2 ист. ~1 мин

This Google/CMU paper (arXiv 2605.26099) proposes a sleep-like memory consolidation mechanism for language models. Periodically, the model converts recent context into persistent fast weights in SSM blocks through N offline recurrent passes, then clears its KV cache. On synthetic tasks (cellular automata, multi-hop graph retrieval) and math reasoning benchmarks, increasing sleep duration N improves performance, with the largest gains on examples requiring deeper multi-step reasoning.

Почему это важно

Introduces a principled mechanism for converting short-term context into long-term weights — pointing toward a new paradigm for handling very long contexts without unbounded KV cache growth, a key bottleneck for production inference.

Важность: 2/5

Verified arxiv paper from Google/CMU; novel sleep-inspired memory consolidation with practical implications for long-context inference.

Источники