DomainShuttle: Subject-Driven Text-to-Video Across In-Domain and Cross-Domain Scenarios

Research official + media 2 src. ~1 min

A text-to-video system for subject-driven synthesis across two scenarios: in-domain (preserving reference subject features precisely) and cross-domain (flexible variation while retaining identity). Introduces Domain-MoT (domain-aware adaptive layer normalization), Video-Reference DualRoPE (separate rotary position encoding for reference and video tokens), and Cross-Pair Consistent Loss. Ranked third on HF Daily Papers for June 25 (34 upvotes).

Why it matters

Existing subject-driven video methods trade off fidelity against editability — DomainShuttle proposes architectural components that decouple these objectives, enabling both accurate subject preservation and free domain transfer.

Importance: 2/5

Novel architecture decoupling identity fidelity and domain flexibility in subject-driven video; 34 HF upvotes

text-to-video training generative-models paper

Sources

official arXiv:2606.26058 — DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

media HuggingFace Daily Papers — June 25, 2026 (34 upvotes)