#chain-of-thought 1 item 9 мая OpenAI Discloses Accidental Chain-of-Thought Grading in RL Training Across Six Models OpenAI research