Alibaba Launches Qwen3.7-Plus: Multimodal Agent with Vision, Reasoning, and Autonomous Execution

Alibaba / Qwen

Models / LLM official + media 3 src. ~1 min

Alibaba's Qwen team released Qwen3.7-Plus on June 2, 2026, adding native image and video understanding to the earlier text-only Qwen3.7-Max. The model combines deep reasoning, self-programming, tool invocation, verification, and autonomous iteration in a single agentic loop, scoring 79 on screen-understanding benchmarks and outperforming GPT-5.4 and Gemini-3.1 Pro on that task. Available via Alibaba Cloud Bailian API at $0.40/$1.60 per million input/output tokens; Alibaba shares rose over 6% on the announcement.

Why it matters

First Qwen release to unify vision and agentic execution in one model, enabling autonomous end-to-end workflows — including building a full app over 11 hours without human intervention — and advancing the frontier of Chinese multimodal agents.

Importance: 4/5

Frontier multimodal agent model from Alibaba with SOTA on screen-understanding benchmarks; Alibaba shares surged 6% on announcement day.

qwen alibaba multimodal agents gui-agent agentic closed-source china

Sources

official Qwen3.7-Plus: Multimodal Agent Intelligence — Qwen Blog

media Alibaba's Qwen Team Launches Qwen3.7-Plus — MarkTechPost

media Alibaba (9988.HK) Launches Qwen3.7-Plus AI Model, Stock Surges Over 6% — GuruFocus