World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Microsoft Research

Research media only 1 src. ~1 min

RL fine-tuning of text-to-video with a reward signal based on 3D geometric consistency; the 3D-aware reward sharply improves temporal coherence without degrading visual quality.

Importance: 2/5

Backfilled from MD; not retroactively scored.

Sources

media arXiv