Learning while Deploying: Fleet-Scale Reinforcement Learning Turns Robot Deployment into Continuous Training
AGIBot
Introduces LWD (Learning While Deploying), a fleet-scale offline-to-online RL framework that turns robot deployment itself into a continuous training loop for Vision-Language-Action (VLA) generalist policies. A pre-trained policy is deployed across a robot fleet; autonomous rollouts and human interventions feed a shared replay buffer for iterative policy updates, adapting to real-world distribution shifts that static training datasets cannot cover. Appeared on HuggingFace Daily Papers May 4.
Why it matters
One of the first published frameworks for fleet-level continuous RL post-training of generalist VLA robots at deployment scale, directly tackling the sim-to-real and distribution-shift problem that has limited practical robotics deployments. The fleet-as-training-data model could significantly accelerate generalist robot learning in production.
Importance: 3/5
Novel fleet-scale RL framework for robotics on HF Daily Papers, addressing a core bottleneck in practical robot deployment.