Flow-OPD: On-Policy Distillation Pushes GenEval +29 Points on Stable Diffusion 3.5

Research official + media 2 src. ~1 min

Flow-OPD is the first framework to integrate on-policy distillation into flow matching text-to-image models. A two-stage strategy — single-reward GRPO fine-tuning of specialized teacher models, then consolidation via dense trajectory-level vector field supervision with Manifold Anchor Regularization — achieves GenEval +29 points (63→92) and OCR accuracy +35 points (59→94) on Stable Diffusion 3.5 Medium, surpassing individual teacher models.

Why it matters

113 HF Daily upvotes; offers a principled solution to multi-objective RLHF alignment for diffusion models — a major open problem for production text-to-image systems attempting to satisfy competing objectives simultaneously.

Importance: 3/5

113 HF Daily upvotes; GenEval +29 on SD3.5 Medium via a principled multi-objective RLHF approach; first on-policy distillation framework for flow matching models.

diffusion rl alignment image-generation

Sources

official Flow-OPD: On-Policy Distillation for Flow Matching Models — arXiv

media Flow-OPD — Hugging Face Daily Papers