PerceptionRubrics: Atomic Rubric Evaluation Reveals 8% Perception Gap Between Open and Closed Models
Johns Hopkins University researchers present PerceptionRubrics (ICML 2026), pairing 1,000+ visually dense images with 12,004 atomic evaluation rubrics split into Must-Right and Easy-Wrong criteria. A gated binary scoring mechanism penalizes failures on mandatory visual elements rather than averaging scores. Key finding: an 8% perception gap persists between open-source frontier models and proprietary leaders.
Why it matters
Standard multimodal benchmarks inflate scores by averaging over components; PerceptionRubrics exposes brittleness in visually rich domains and correlates better with human judgment.
Importance: 3/5
ICML 2026 acceptance; quantifies 8% perception gap between open-source and proprietary frontier models; 35 HF Daily Papers upvotes