Alibaba Launches Qwen3.7-Plus: Multimodal Agent with Vision, Reasoning, and Autonomous Execution
Alibaba / Qwen
Alibaba's Qwen team released Qwen3.7-Plus on June 2, 2026, adding native image and video understanding to the earlier text-only Qwen3.7-Max. The model combines deep reasoning, self-programming, tool invocation, verification, and autonomous iteration in a single agentic loop, scoring 79 on screen-understanding benchmarks and outperforming GPT-5.4 and Gemini-3.1 Pro on that task. Available via Alibaba Cloud Bailian API at $0.40/$1.60 per million input/output tokens; Alibaba shares rose over 6% on the announcement.
Why it matters
First Qwen release to unify vision and agentic execution in one model, enabling autonomous end-to-end workflows — including building a full app over 11 hours without human intervention — and advancing the frontier of Chinese multimodal agents.
Importance: 4/5
Frontier multimodal agent model from Alibaba with SOTA on screen-understanding benchmarks; Alibaba shares surged 6% on announcement day.