Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning

Carnegie Mellon University / Ohio State University

Research official 1 src. ~1 min

The paper provides the first theoretical proof that transformer-based agents learn depth-first search mechanisms purely from sparse RL feedback, without expert demonstrations. A two-head transformer is constructed where one head tracks prior actions and another detects failures and triggers backtracking. Under a depth-wise curriculum, DFS emerges in stages: models trained on shallow trees generalize to deeper ones, and imbalanced goal distributions cause return discounting to produce a prioritized DFS variant.

Why it matters

Fills a major theoretical gap by explaining why RL training produces search-capable agents and provides mechanistic insight into how transformer attention heads specialize during RL — directly relevant to understanding and designing reasoning models.

Importance: 3/5

Rigorous theory paper from CMU/OSU; first proof that transformers learn DFS from RL alone — has direct implications for understanding frontier reasoning models.

Sources