DomainShuttle: Subject-Driven Text-to-Video Across In-Domain and Cross-Domain Scenarios
A text-to-video system for subject-driven synthesis across two scenarios: in-domain (preserving reference subject features precisely) and cross-domain (flexible variation while retaining identity). Introduces Domain-MoT (domain-aware adaptive layer normalization), Video-Reference DualRoPE (separate rotary position encoding for reference and video tokens), and Cross-Pair Consistent Loss. Ranked third on HF Daily Papers for June 25 (34 upvotes).
Why it matters
Existing subject-driven video methods trade off fidelity against editability — DomainShuttle proposes architectural components that decouple these objectives, enabling both accurate subject preservation and free domain transfer.
Importance: 2/5
Novel architecture decoupling identity fidelity and domain flexibility in subject-driven video; 34 HF upvotes