AI2 Open-Sources MolmoAct2: Robotics VLA That Claims to Beat GPT-5 on Embodied Reasoning

AI2

Research official + media 2 src. ~1 min

Allen Institute for AI releases MolmoAct2, an open-source robotic control system built around MolmoER, a vision-language model trained on 3.3M samples for spatial reasoning. The release includes three new datasets — including the largest open bimanual dataset to date with 720 hours of teleoperated trajectories — an open-source action tokenizer (OpenFAST), and MolmoThink, an adaptive reasoning mechanism that re-predicts depth tokens only for changed scene regions to reduce latency. Full model weights, training code, and datasets are released publicly.

Why it matters

MolmoER reportedly outperforms GPT-5 and Gemini Robotics ER-1.5 on embodied reasoning benchmarks across seven tasks. Releasing the largest open bimanual dataset alongside full training code is a significant open-science contribution, especially as frontier labs keep similar resources proprietary.

Importance: 3/5

AI2 open-source robotics model claiming SOTA over GPT-5 on embodied reasoning, with full dataset and code release.

robotics embodied-ai multimodal open-source paper

Sources

official MolmoAct2: Action Reasoning Models for Real-world Deployment — arXiv

secondary HuggingFace Daily Papers — May 5, 2026