DeepReinforce Releases Ornith-1.0: Open-Source Coding Models That Learn Their Own RL Scaffolds

DeepReinforce

Tools official + media 3 src. ~1 min

DeepReinforce released Ornith-1.0 on June 25, a family of four MIT-licensed agentic coding models (9B dense, 31B dense, 35B MoE, 397B MoE) built on Gemma 4 and Qwen 3.5 bases. Instead of using human-designed RL scaffolds, each model learns to generate its own task-specific harnesses during RL training, with rewards flowing back to both scaffold generation and solution generation stages. The 397B flagship achieves 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified, matching Claude Opus 4.7.

Why it matters

Self-scaffolding RL is a meaningful departure from fixed-harness training, and this is the first open-source model family to match a recent Anthropic frontier model on agentic coding benchmarks at MIT license.

Importance: 3/5

First open-source coding model family matching Anthropic frontier model on SWE-Bench at MIT license with novel self-scaffolding RL approach

open-source mit coding reinforcement-learning swe-bench moe release

Sources

official deepreinforce-ai/Ornith-1.0-35B — Hugging Face

media DeepReinforce Releases Ornith-1.0 — MarkTechPost

media DeepReinforce releases Ornith-1.0 open-source coding models — Testing Catalog