NVIDIA Releases Cosmos 3: Open Omnimodal World Foundation Model for Physical AI

NVIDIA

Research official + media 4 src. ~1 min

NVIDIA released Cosmos 3, the first fully open omnimodal foundation model for physical AI reasoning, trained on 20T tokens of multimodal data including ~1B images, 400M videos, ambient audio, and action sequences. Built on a mixture-of-transformers architecture that unifies vision reasoning, world generation, and action prediction, it ranks first on eight or more vision-reasoning and world-generation leaderboards. Cosmos 3 Super and Nano are immediately available on build.nvidia.com, Hugging Face, and GitHub under the OpenMDW-1.1 license.

Why it matters

First open foundation model unifying perception, world simulation, and action prediction for robotics and AV training; 8,680 upvotes on HF Daily Papers.

Importance: 5/5

Paradigm-level open omnimodal world model from NVIDIA; first-rank on 8+ leaderboards; 8,680 HF Daily Paper upvotes (top paper of the day); official NVIDIA + HF blog + Axios confirmation.

world-models multimodal robotics embodied-ai open-weights paper physical-ai

Sources

official Cosmos 3: Omnimodal World Models for Physical AI — arXiv

official Welcome NVIDIA Cosmos 3 — Hugging Face Blog

official NVIDIA Launches Cosmos 3 — NVIDIA Newsroom

media Nvidia's Cosmos 3 open AI world model — Axios