mech-interp — AI Digest

18 мая Judge Circuits: Mechanistic Explanation of LLM-as-Judge Format Inconsistency research
11 июн Anatomy of Post-Training: Using Interpretability to Audit and Fix Preference Data research