#benchmark 2 items 30 апр Programming with Data: test-driven data engineering for self-improving LLMs OpenDataLab research 1 мая AutoResearchBench — a benchmark for autonomous scientific literature search by AI agents BAAI research