Prime Intellect Releases prime-rl v0.6.0 for Agentic RL on Trillion-Parameter MoE Models

Prime Intellect

Research official + media 3 src. ~1 min

Prime Intellect released prime-rl v0.6.0 (June 22–23, 2026), an open-source framework for asynchronous reinforcement learning on trillion-parameter MoE models targeting long-horizon agentic tasks like software engineering. The framework decouples trainer and inference into independent async processes. A GLM-5 demonstration ran SWE tasks at 131K sequence length with sub-5-minute step times and 256 rollout batch size on only 28 H200 nodes. Router replay cuts KL mismatch between trainer and inference by roughly 10x.

Why it matters

Previously, scaling agentic RL to trillion-parameter scale required cluster sizes beyond most research budgets. prime-rl 0.6.0 demonstrates it is feasible with 28 H200 nodes — accessible to mid-sized labs — and the open-source release lets other organizations replicate this capability.

Importance: 3/5

Open-source framework enabling trillion-parameter agentic RL on 28 H200 nodes; democratizes previously inaccessible scale for mid-sized labs

Sources