PerceptionRubrics: Atomic Rubric Evaluation Reveals 8% Perception Gap Between Open and Closed Models

Research official 1 src. ~1 min

Johns Hopkins University researchers present PerceptionRubrics (ICML 2026), pairing 1,000+ visually dense images with 12,004 atomic evaluation rubrics split into Must-Right and Easy-Wrong criteria. A gated binary scoring mechanism penalizes failures on mandatory visual elements rather than averaging scores. Key finding: an 8% perception gap persists between open-source frontier models and proprietary leaders.

Why it matters

Standard multimodal benchmarks inflate scores by averaging over components; PerceptionRubrics exposes brittleness in visually rich domains and correlates better with human judgment.

Importance: 3/5

ICML 2026 acceptance; quantifies 8% perception gap between open-source and proprietary frontier models; 35 HF Daily Papers upvotes

multimodal evaluation benchmark vision-language icml-2026

Sources

official PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception — arxiv