SCAIL-2: End-to-End Character Animation via In-Context Conditioning

Tsinghua University

Research official 1 src. ~1 min

SCAIL-2 (arXiv:2606.10804) eliminates intermediate representations (pose skeletons, background masks) in controlled character animation by directly concatenating driving videos into the generation sequence. Key components: MotionPair-60K (new synthetic dataset), in-context mask conditioning, mode-specific RoPE for soft guidance, and Bias-Aware DPO to reduce synthetic artifacts. Achieves SOTA across multiple controlled animation tasks.

Why it matters

Removing the brittle intermediate-representation pipeline in favor of end-to-end in-context conditioning simplifies production character animation pipelines. 95 upvotes on HuggingFace Daily Papers reflects strong community interest from the digital production and game development communities.

Importance: 2/5

Second on HF Daily Papers June 10 (95 upvotes); eliminates brittle skeleton/mask pipeline for character animation.

video-generation multimodal diffusion character-animation

Sources

official arXiv:2606.10804 — SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning