AI
AI Digest
EN RU
Home Archive About RSS

#reward-hacking

3 items

  • 3 мая Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research
  • 3 мая OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research
  • 6 мая OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x OpenAI research

ai-digest.kerby.pro

© 2026 Alexei Lukin · CC BY 4.0

RSS · JSON Feed · About