Learning while Deploying: Fleet-Scale Reinforcement Learning Turns Robot Deployment into Continuous Training

AGIBot

Research official 1 src. ~1 min

Introduces LWD (Learning While Deploying), a fleet-scale offline-to-online RL framework that turns robot deployment itself into a continuous training loop for Vision-Language-Action (VLA) generalist policies. A pre-trained policy is deployed across a robot fleet; autonomous rollouts and human interventions feed a shared replay buffer for iterative policy updates, adapting to real-world distribution shifts that static training datasets cannot cover. Appeared on HuggingFace Daily Papers May 4.

Why it matters

One of the first published frameworks for fleet-level continuous RL post-training of generalist VLA robots at deployment scale, directly tackling the sim-to-real and distribution-shift problem that has limited practical robotics deployments. The fleet-as-training-data model could significantly accelerate generalist robot learning in production.

Importance: 3/5

Novel fleet-scale RL framework for robotics on HF Daily Papers, addressing a core bottleneck in practical robot deployment.

rl paper

Sources

official HuggingFace Daily Papers — May 4, 2026