AI
AI Digest
EN RU
Home Archive About RSS

#policy-optimization

3 items

  • 10 июн DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research
  • 10 июн Flow-DPPO: Principled RL Alignment for Flow Matching Image and Video Models Tencent Hunyuan research
  • 17 июн ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners NVIDIA research

ai-digest.kerby.pro

© 2026 Alexei Lukin · CC BY 4.0

RSS · JSON Feed · About