Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops
Investigates how cross-modal evaluator bias propagates in self-evolving agent loops using LLMs as judges. The MM-EPC framework shows that when GPT-4o evaluates DeepSeek-chat across modalities, a single strategy can monopolize nearly half the reward signal — 'cross-modal contagion'. Cross-model evaluation is the primary risk factor; self-evaluation shows near-complete immunity. Validated with ~35,000 API calls.
Why it matters
As self-improving agents proliferate, understanding how evaluator choice corrupts reward signals is critical. The finding that self-evaluation avoids contagion creates a concrete design trade-off for RLHF and agent-evolution pipelines.
Importance: 2/5
Identifies a concrete failure mode in LLM-as-judge evaluation for self-evolving agents with empirical validation at scale.