OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents

Tencent Hunyuan

Research official 2 src. ~1 min

OpenSearch-VL provides a fully open framework for training multimodal deep-search agents that operate as closed-loop systems: they inspect images, crop regions of interest, issue web and image searches, visit retrieved pages, and answer grounded in gathered evidence. The paper introduces a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures, achieves over 10-point average improvements across seven benchmarks, and releases all data, code, and model checkpoints.

Why it matters

One of the first fully open recipes for training multimodal agentic search systems competitive with proprietary models; the fatal-aware RL training approach addresses a practical gap in multi-step agentic pipelines.

Importance: 3/5

92 HF trending upvotes; fully open recipe achieving 10+ point average improvement on 7 benchmarks; addresses cascading tool failures in multimodal agent training.

multimodal agents rl search vlm open-source paper

Sources

official OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents — arXiv

media OpenSearch-VL on HuggingFace Daily Papers