TMAS: Scaling Test-Time Compute via Multi-Agent Synergy with Hierarchical Memory

Research official 2 src. ~1 min

TMAS scales test-time compute through structured multi-agent coordination, employing two hierarchical memory systems — an experience bank for reliable intermediate results and a guidelines bank for explored strategies — alongside a hybrid reward reinforcement learning scheme. The approach prevents redundant computation across parallel reasoning trajectories and achieves superior scaling on challenging reasoning benchmarks.

Why it matters

Addresses the underexplored problem of coordination overhead in multi-agent inference scaling, offering a deployable route to better reasoning without naive duplication of effort

Importance: 2/5

Hierarchical memory + hybrid RL for multi-agent test-time compute; addresses coordination overhead that limits naive parallel agent scaling on reasoning tasks.

reasoning multi-agent rl inference

Sources

official TMAS: Scaling Test-Time Compute via Multi-Agent Synergy — arXiv:2605.10344

media TMAS — Hugging Face Daily Papers