SU-01: Gold-Medal-Level Olympiad Reasoning via Curriculum SFT and Two-Stage RL

SU-01 Team

Research official + media 2 src. ~1 min

SU-01 is a 30B-A3B model trained with reverse-perplexity curriculum SFT followed by two-stage RL (~340K SFT trajectories + 200 RL steps). The model achieves gold-medal-level performance on IMO, USAMO, and IPhO benchmarks, handling reasoning trajectories exceeding 100K tokens stably.

Why it matters

Gold-medal-level performance on multiple international olympiads across mathematics and physics is a qualitative milestone for AI reasoning. The result comes from careful curriculum and two-stage RL rather than exotic architecture changes. 75 upvotes on HF Daily (May 15).

Importance: 3/5

Qualitative milestone for reasoning (olympiad gold medal level); 75 HF Daily upvotes

reasoning mathematics rl scaling long-context

Sources

official arXiv: SU-01 Olympiad Reasoning

media HF Daily Papers May 15, 2026