DeepReinforce Releases Ornith-1.0: Open-Source Coding Models That Learn Their Own RL Scaffolds
DeepReinforce
DeepReinforce released Ornith-1.0 on June 25, a family of four MIT-licensed agentic coding models (9B dense, 31B dense, 35B MoE, 397B MoE) built on Gemma 4 and Qwen 3.5 bases. Instead of using human-designed RL scaffolds, each model learns to generate its own task-specific harnesses during RL training, with rewards flowing back to both scaffold generation and solution generation stages. The 397B flagship achieves 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified, matching Claude Opus 4.7.
Why it matters
Self-scaffolding RL is a meaningful departure from fixed-harness training, and this is the first open-source model family to match a recent Anthropic frontier model on agentic coding benchmarks at MIT license.
Importance: 3/5
First open-source coding model family matching Anthropic frontier model on SWE-Bench at MIT license with novel self-scaffolding RL approach