SANA-WM: Minute-Scale 720p World Modeling on a Single GPU

NVIDIA

Research official 1 src. ~1 min

SANA-WM (arXiv 2605.15178, 54 HF Daily upvotes) is a 2.6B-parameter world model generating high-fidelity 720p video at minute scale with 6-DOF camera control. It uses hybrid linear attention to handle long sequences and a dual-branch camera control system. Generates 60-second clips on a single GPU; distilled versions run on consumer hardware. Trained in 15 days on 64 GPUs, significantly more efficient than comparable industrial systems.

Why it matters

Generating 720p video at minute scale on a single GPU is a meaningful compute efficiency milestone. Prior work either required large clusters for quality or sacrificed quality for speed. The hybrid linear attention architecture points toward a scalable path for embodied AI simulation without dedicated infrastructure.

Importance: 3/5

54 HF Daily upvotes; single-GPU 60-second 720p video is a practical efficiency milestone for world models

Sources