Orca: BAAI's General World Foundation Model Trained on 125K Hours of Video

BAAI

Research official + media 2 src. ~1 min

Orca is a general world foundation model from BAAI trained on 125K hours of video and 160M event annotations. It introduces Next-State-Prediction as a unified objective, combining unconscious learning from dense video transitions and conscious learning from language-described events. Evaluated on text generation, image prediction, and embodied action, it outperforms same-scale specialized baselines across all three modalities.

Why it matters

The most upvoted paper on HuggingFace Daily Papers on July 1 with 187 upvotes. Proposes a single model architecture spanning language, vision, and action — a step toward general world models rather than task-specific architectures.

Importance: 4/5

187 HF Daily Papers upvotes; general world model unifying text, vision, and embodied action in one architecture

multimodal world-models reasoning embodied-ai

Sources

official Orca: The World is in Your Mind — arxiv

media HuggingFace Daily Papers — July 1, 2026