#reward-modeling 2 items 11 июн Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF Alibaba research 3 июн QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research