Judge Circuits: Mechanistic Explanation of LLM-as-Judge Format Inconsistency

Research official 1 src. ~1 min

Researchers applied causal circuit analysis to Gemma-3, Qwen2.5, and Llama-3 to explain why LLM judges produce inconsistent scores across output formats (e.g., 1–5 vs. True/False). They identified a sparse 'Latent Evaluator' sub-graph in mid-to-late layers shared across tasks; a single continuous judgment signal routes through fragile format-specific terminal branches, explaining format-driven score variance (arXiv:2605.16023).

Why it matters

LLM-as-judge is standard in evaluation pipelines, yet its reliability is poorly understood mechanistically. This is the first circuit-level account of why the same model's judgment diverges by format — directly actionable for calibrating automated evaluation systems.

Importance: 3/5

First mechanistic circuit analysis explaining LLM-as-judge format inconsistency — directly actionable for evaluation pipelines

interpretability mech-interp benchmark evaluation

Sources

official Judge Circuits (arXiv:2605.16023)