Humanoid-GPT: Scaling to 2B Motion Frames Enables Zero-Shot Generalization in Humanoid Control

исследования офиц. + СМИ 3 ист. ~1 мин

Humanoid-GPT (arXiv 2606.03985, CVPR 2026) trains a GPT-style causal Transformer on a 2-billion-frame motion corpus aggregating seven datasets for whole-body humanoid control. Scaling both data and model capacity yields a single generative model that tracks highly dynamic motions while achieving zero-shot generalization to unseen tasks — dissolving the agility-generalization tradeoff inherent to prior MLP-based trackers. Inference latency is under 1.5ms on an RTX 4090. The paper also introduces Harmonic Motion Embedding (HME) to quantify motion diversity.

Почему это важно

Establishes clear GPT-style scaling laws for motion tracking, suggesting the same data-scaling recipe that worked for language applies directly to humanoid control — accepted at CVPR 2026, 18 upvotes on HuggingFace Daily Papers.

Важность: 3/5

CVPR 2026 acceptance; establishes scaling laws for humanoid motion tracking; 18 HF upvotes.

Источники