S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence in VLMs

Nanyang Technological University

Research official 1 src. ~1 min

S-Agent reframes spatial reasoning in vision-language models as an agentic process: a VLM planner dispatches spatial tools to accumulate evidence across 2D-to-3D projections and time, maintaining scene and agent memory across frames. The approach is training-free for existing models, and a fine-tuned S-Agent-8B matches closed-source models on spatial benchmarks.

Why it matters

Shows that tool-augmented agency can substitute for brute-force scale in spatial intelligence, with an 8B model matching frontier closed-source systems

Importance: 2/5

42 upvotes on HF Daily Papers; training-free spatial reasoning improvement via tool use

Sources