Daily digest

May 1, 2026

5 items · ~5 min · Week 2026-W18

Worth knowing (2)

Models / LLM official + media 2 src. ~1 min

On April 30, 2026, Baidu unveiled a preview version of ERNIE-5.1-Preview. The model debuted at #13 on the global LMArena Text Arena leaderboard with a score of 1476, becoming the top-ranked Chinese model and overtaking DeepSeek-V4-Pro. According to Baidu, the model uses roughly one-third of the total parameters and half the active parameters of ERNIE-5.0, at approximately 6% of the pre-training cost of comparable models. The full ERNIE 5.1 release is expected at the Baidu Create conference.

Why it matters

Confirms the sharp acceleration of the Chinese race following DeepSeek V4: Baidu claims leadership among Chinese labs on LMArena at substantially lower training cost.

#baidu #ernie #china #lmarena #preview

Tools official 2 src. ~1 min

OpenAI shipped a stable release of Codex CLI v0.128.0 following a series of 0.126.x alphas. The headline feature is persisted /goal workflows: long-running goals are stored via the app-server API, exposed as model tools, support runtime continuation, and have dedicated TUI controls. Permission profiles have been expanded with built-in defaults and sandbox-profile selection directly from the CLI; the --full-auto flag is deprecated in favor of explicit permission profiles. Plugin workflows are improved (marketplace install, remote-bundle cache), and external-agent session import with background import has been added. MultiAgentV2 gained configurable thread caps and wait-time.

Why it matters

Persisted /goal turns Codex CLI from a stateless helper into a platform for long-lived autonomous tasks, competing with Claude Code and Cursor for background agents.

#codex #openai #coding-agent #cli

For reference (3)

Research official + media 2 src. ~1 min

A new benchmark has been published for evaluating agents on autonomous scientific literature search and review. It includes two complementary setups: Deep Research (multi-step investigation leading to a specific target paper) and Wide Research (exhaustive collection of publications matching given criteria, scored by IoU). Even the strongest LLM agents reach only 9.39% accuracy on Deep Research and 9.31% IoU on Wide Research.

Why it matters

Closes a methodological gap between general-purpose web agents and the actual work of a researcher; the ~9% figures set a ceiling against which progress on research agents can be measured throughout 2026.

#agents #benchmark #rag #evaluation

Tools official 2 src. ~1 min

Anthropic released Claude Code 2.1.126. A new `claude project purge [path]` command fully wipes state (transcripts, tasks, file history, config). The model picker now pulls the model list from a compatible gateway's /v1/models endpoint when ANTHROPIC_BASE_URL is set. The --dangerously-skip-permissions flag now genuinely bypasses confirmation prompts for writes to protected paths (.claude/, .git/, .vscode/). Regressions in allowManagedDomainsOnly/allowManagedReadPathsOnly have been fixed, and images larger than 2000px are now automatically downscaled on paste.

Why it matters

A cumulative fix release that closes several security regressions in the permission allowlist and streamlines work through enterprise gateways.

#claude-code #anthropic #coding-agent #cli

Tools official 1 src. ~1 min

SST released opencode v1.14.31 (May 1, 2026). It adds an interactive Azure setup that prompts for the resource name and saves the API key. Task child sessions now inherit permissions from the parent session. Clearer errors are surfaced for invalid remote MCP URLs. A Desktop app crash when restoring sessions with missing models has been fixed.

Why it matters

One of the few open-source coding agents actively keeping pace with Claude Code and Codex on features; releases ship same-day.

#opencode #sst #coding-agent #cli

May 1, 2026

Worth knowing (2)

Baidu releases ERNIE-5.1-Preview — #1 Chinese model on LMArena

OpenAI Codex CLI 0.128.0 — persisted /goal workflows and expanded permission profiles

AutoResearchBench — a benchmark for autonomous scientific literature search by AI agents

Claude Code 2.1.126 — project purge, model picker via gateway, security fixes

OpenCode v1.14.31 — interactive Azure setup and permission inheritance for task sessions