S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence in VLMs
Nanyang Technological University
S-Agent reframes spatial reasoning in vision-language models as an agentic process: a VLM planner dispatches spatial tools to accumulate evidence across 2D-to-3D projections and time, maintaining scene and agent memory across frames. The approach is training-free for existing models, and a fine-tuned S-Agent-8B matches closed-source models on spatial benchmarks.
Why it matters
Shows that tool-augmented agency can substitute for brute-force scale in spatial intelligence, with an 8B model matching frontier closed-source systems
Importance: 2/5
42 upvotes on HF Daily Papers; training-free spatial reasoning improvement via tool use