Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale

Qwen Team, Alibaba

Research official + media 2 src. ~1 min

Qwen-AgentWorld presents two foundation world models (35B and 397B parameters) trained on over 10 million interaction trajectories across seven domains, using a three-stage pipeline: capability injection, next-state-prediction activation, and RL refinement. The system serves as both a scalable environment simulator for RL training and a warm-up stage for downstream agent tasks, accompanied by the new AgentWorldBench benchmark.

Why it matters

Language world models that faithfully simulate environment dynamics could reduce the cost of RL data collection and allow agents to practice in simulation before real deployment. At 397B parameters this is the largest dedicated agent world model to date.

Importance: 3/5

Largest dedicated language world model (397B) for agent RL simulation, dual-use as simulator and warm-up for downstream tasks

agents reasoning rl multimodal paper simulation

Sources

official Qwen-AgentWorld: Language World Models for General Agents — arXiv

media Qwen-AgentWorld: Language World Models for Reliable AI Agents — nxcode.io