Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic

Research official 1 src. ~1 min

Sergey Rodionov submitted a paper (arXiv:2605.05138, May 6) presenting a coding-agent approach to ARC-AGI-3 where the agent maintains an executable Python world model, validates it against prior observations, and applies a simplicity bias via refactoring. Tested across 25 public ARC-AGI-3 games without game-specific logic: 7 games fully solved, 6 games above 75% RHAE, mean RHAE of 32.58%.

Why it matters

ARC-AGI-3 is a new and significantly harder generalization benchmark; this establishes a game-general baseline and provides evidence that verifier-driven executable world models are a viable path, contributing to ongoing debates about symbolic vs. neural reasoning approaches.

Importance: 2/5

First game-general baseline on ARC-AGI-3, a newly significant reasoning benchmark

reasoning coding-agent paper benchmark agents

Sources

official [2605.05138] Executable World Models for ARC-AGI-3 in the Era of Coding Agents — arXiv