Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning
Carnegie Mellon University / Ohio State University
The paper provides the first theoretical proof that transformer-based agents learn depth-first search mechanisms purely from sparse RL feedback, without expert demonstrations. A two-head transformer is constructed where one head tracks prior actions and another detects failures and triggers backtracking. Under a depth-wise curriculum, DFS emerges in stages: models trained on shallow trees generalize to deeper ones, and imbalanced goal distributions cause return discounting to produce a prioritized DFS variant.
Why it matters
Fills a major theoretical gap by explaining why RL training produces search-capable agents and provides mechanistic insight into how transformer attention heads specialize during RL — directly relevant to understanding and designing reasoning models.
Importance: 3/5
Rigorous theory paper from CMU/OSU; first proof that transformers learn DFS from RL alone — has direct implications for understanding frontier reasoning models.