#reward-hacking 2 items 3 мая Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research 3 мая OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research