Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL
Princeton University
Odysseus trains vision-language models to play Super Mario Land for 100+ consecutive decision turns using a PPO variant with a lightweight turn-level critic. Pretrained VLMs provide strong action priors that significantly improve sample efficiency versus classic deep RL from scratch. The framework achieves at least 3× the average game progress of frontier models while preserving general-domain VLM capabilities.
Why it matters
Long-horizon interactive decision-making (100+ turns) with coordinated perception, reasoning, and action remains an open challenge for current VLMs. Odysseus demonstrates a practical RL recipe that avoids catastrophic forgetting while substantially outperforming frontier models, with findings likely transferable to real-world agentic tasks.
Importance: 2/5
Novel RL recipe for long-horizon VLM decision-making from Princeton, outperforming frontier models in the 100+ turn setting.