Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops

Research official 1 src. ~1 min

Investigates how cross-modal evaluator bias propagates in self-evolving agent loops using LLMs as judges. The MM-EPC framework shows that when GPT-4o evaluates DeepSeek-chat across modalities, a single strategy can monopolize nearly half the reward signal — 'cross-modal contagion'. Cross-model evaluation is the primary risk factor; self-evaluation shows near-complete immunity. Validated with ~35,000 API calls.

Why it matters

As self-improving agents proliferate, understanding how evaluator choice corrupts reward signals is critical. The finding that self-evaluation avoids contagion creates a concrete design trade-off for RLHF and agent-evolution pipelines.

Importance: 2/5

Identifies a concrete failure mode in LLM-as-judge evaluation for self-evolving agents with empirical validation at scale.

evaluation agents multimodal alignment paper

Sources

official arXiv:2606.16682 — Multimodal Evaluator Preference Collapse