#reward-hacking
- Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research
- OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research
- OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x OpenAI research