RoPE Provably Fails at Long Contexts: Locality Bias and Token Consistency Both Break
A NeurIPS 2026 submission (arXiv:2605.15514) formally proves two fundamental failures of Rotary Positional Embeddings (RoPE) at long context lengths: locality bias collapses (the model cannot reliably favor nearby tokens), and token consistency breaks (attention scores for the same token differ by position). The authors prove these failures are in direct tension — adjusting RoPE's base parameter trades one failure for the other rather than resolving either.
Why it matters
RoPE is the positional encoding used in nearly every major open-weight LLM (Llama, Mistral, Qwen, Gemma). A formal proof of its theoretical failure at long contexts motivates replacement mechanisms and explains reported performance cliffs in long-document tasks.
Importance: 4/5
NeurIPS 2026 submission formally proving theoretical failure of positional encoding used in virtually all major open-weight LLMs