#monitorability 1 item 9 мая OpenAI Discloses Accidental Chain-of-Thought Grading in RL Training Across Six Models OpenAI research