NVIDIA Releases Cosmos 3: Open Omnimodal World Foundation Model for Physical AI
NVIDIA
NVIDIA released Cosmos 3, the first fully open omnimodal foundation model for physical AI reasoning, trained on 20T tokens of multimodal data including ~1B images, 400M videos, ambient audio, and action sequences. Built on a mixture-of-transformers architecture that unifies vision reasoning, world generation, and action prediction, it ranks first on eight or more vision-reasoning and world-generation leaderboards. Cosmos 3 Super and Nano are immediately available on build.nvidia.com, Hugging Face, and GitHub under the OpenMDW-1.1 license.
Why it matters
First open foundation model unifying perception, world simulation, and action prediction for robotics and AV training; 8,680 upvotes on HF Daily Papers.
Importance: 5/5
Paradigm-level open omnimodal world model from NVIDIA; first-rank on 8+ leaderboards; 8,680 HF Daily Paper upvotes (top paper of the day); official NVIDIA + HF blog + Axios confirmation.