Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

Rutgers University

Research official 1 src. ~1 min

Proposes Progressive On-Policy Critique Distillation (OPCD), where a weak model acts as a critic providing revision directions rather than binary judgments (arXiv:2606.00424). The key insight is that weak critics only need to offer non-misleading improvement directions — not correct final answers — enabling strong models to leverage their own knowledge for self-improvement. The method filters high-quality critiques and distills critic-guided behaviors into the strong model through adaptive self-teaching. Shows improvements on reasoning and alignment benchmarks across training iterations.

Why it matters

Scalable oversight is a central alignment challenge: as models grow more capable, human and weak-model supervision becomes insufficient. OPCD offers a practical path where cheap weak critics can bootstrap stronger models without requiring the critic to fully understand the task — the critic just needs to point in a better direction, addressing the same problem as constitutional AI and debate from a distillation angle.

Importance: 2/5

Notable scalable oversight paper with practical implications for training pipelines; addresses a core alignment challenge.

Sources