Qwen-AgentWorld: Language World Models for General Agents at 35B and 397B Scale
Qwen Team, Alibaba
Qwen-AgentWorld presents two foundation world models (35B and 397B parameters) trained on over 10 million interaction trajectories across seven domains, using a three-stage pipeline: capability injection, next-state-prediction activation, and RL refinement. The system serves as both a scalable environment simulator for RL training and a warm-up stage for downstream agent tasks, accompanied by the new AgentWorldBench benchmark.
Why it matters
Language world models that faithfully simulate environment dynamics could reduce the cost of RL data collection and allow agents to practice in simulation before real deployment. At 397B parameters this is the largest dedicated agent world model to date.
Importance: 3/5
Largest dedicated language world model (397B) for agent RL simulation, dual-use as simulator and warm-up for downstream tasks