Direct Corpus Interaction: Rethinking Retrieval for Agentic Search
TIGER-Lab
This paper challenges the assumption that vector-similarity retrieval is optimal for language agents. Direct Corpus Interaction (DCI) lets agents use general-purpose tools such as grep and file reads to search raw corpora, enabling exact lexical constraints, multi-step hypothesis refinement, and local context verification. DCI substantially outperforms strong sparse, dense, and reranking baselines on BRIGHT and BEIR benchmarks without requiring offline indexing or specialized retrieval APIs.
Why it matters
55 HF Daily Papers upvotes; challenges the dominant RAG paradigm with evidence that agents using direct filesystem-style corpus access outperform dedicated retrieval pipelines.
Importance: 2/5
55 HF Daily Papers upvotes; practical finding that agent-native direct corpus tools outperform vector retrieval has implications for RAG system design.