VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL
WeiboAI
VibeThinker-3B (arXiv 2606.16140, June 15) achieves 94.3 on AIME26 (97.1 with test-time scaling), 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on unseen LeetCode contests using curriculum SFT, multi-domain RL, and offline self-distillation on a 3B dense model. Authors propose the Parametric Compression-Coverage Hypothesis: reasoning compresses into compact models while broad factual knowledge requires larger parameter counts.
Why it matters
713 upvotes on HuggingFace Daily Papers. A 3B model matching or exceeding much larger systems on math and code benchmarks challenges core assumptions about scale requirements for frontier reasoning — significant implications for inference cost and edge deployment.
Importance: 4/5
713 HF upvotes + frontier-level reasoning in a 3B model — paradigm-challenging result