#rlvr 2 items 17 июн VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL WeiboAI research 10 июн DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research