World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Microsoft Research
RL fine-tuning of text-to-video with a reward signal based on 3D geometric consistency; the 3D-aware reward sharply improves temporal coherence without degrading visual quality.
Importance: 2/5
Backfilled from MD; not retroactively scored.
Sources
media
arXiv