mechanistic-interpretability — AI Digest

8 мая Natural Language Autoencoders: Turning Claude's Thoughts into Text Anthropic research
10 мая Anthropic Introduces Natural Language Autoencoders for Scalable LLM Interpretability Anthropic research
8 мая Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix research