Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement
NLPIR Lab
Arbor introduces a framework for fully autonomous ML research. An LLM-based coordinator manages a persistent Hypothesis Tree linking hypotheses, experimental artifacts, and learned insights. Executor agents test individual hypotheses in isolated sandboxes, allowing knowledge to accumulate across many experimental rounds rather than being discarded after each run. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal score — over 2.5× the relative held-out gains of both Codex and Claude Code under identical compute budgets.
Why it matters
30 upvotes on HuggingFace June 11. A concrete step toward AI systems that conduct sustained, compounding scientific research. The 2.5× advantage over Codex and Claude Code on a standardized ML engineering benchmark is a strong empirical signal for autonomous research agents.
Importance: 3/5
Notable research paper; Hypothesis Tree framework for autonomous research; 2.5× improvement over Codex/Claude Code on MLE-Bench Lite.