World Action Models: First Systematic Survey of Embodied Foundation Models Unifying World Modeling and Action
OpenMOSS
This survey defines World Action Models (WAMs) as embodied foundation models that unify predictive state modeling with action generation, addressing the limitation of Vision-Language-Action models that learn reactive mappings without explicitly modeling environmental dynamics. The paper provides the first formal taxonomy distinguishing Cascaded and Joint WAM variants, and analyzes data sources, training protocols, and evaluation challenges.
Why it matters
As robotics foundation models move toward real-world deployment, the distinction between reactive models and those that internally model world dynamics becomes critical for safety and generalization.
Importance: 2/5
First systematic WAM taxonomy, HF Daily Papers 33 upvotes