AI
AI Digest
EN RU
Home Archive About RSS

#reward-modeling

2 items

  • 11 июн Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF Alibaba research
  • 3 июн QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research

ai-digest.kerby.pro

© 2026 Alexei Lukin · CC BY 4.0

RSS · JSON Feed · About