Direct Corpus Interaction: Rethinking Retrieval for Agentic Search

TIGER-Lab

Research official 2 src. ~1 min

This paper challenges the assumption that vector-similarity retrieval is optimal for language agents. Direct Corpus Interaction (DCI) lets agents use general-purpose tools such as grep and file reads to search raw corpora, enabling exact lexical constraints, multi-step hypothesis refinement, and local context verification. DCI substantially outperforms strong sparse, dense, and reranking baselines on BRIGHT and BEIR benchmarks without requiring offline indexing or specialized retrieval APIs.

Why it matters

55 HF Daily Papers upvotes; challenges the dominant RAG paradigm with evidence that agents using direct filesystem-style corpus access outperform dedicated retrieval pipelines.

Importance: 2/5

55 HF Daily Papers upvotes; practical finding that agent-native direct corpus tools outperform vector retrieval has implications for RAG system design.

Sources