AI Digest

May 1, 2026

5 items

Models / LLM official + media 2 src. ~1 min

On April 30, 2026, Baidu unveiled a preview version of ERNIE-5.1-Preview. The model debuted at #13 on the global LMArena Text Arena leaderboard with a score of 1476, becoming the top-ranked Chinese model and overtaking DeepSeek-V4-Pro. According to Baidu, the model uses roughly one-third of the total parameters and half the active parameters of ERNIE-5.0, at approximately 6% of the pre-training cost of comparable models. The full ERNIE 5.1 release is expected at the Baidu Create conference.

Why it matters

Confirms the sharp acceleration of the Chinese race following DeepSeek V4: Baidu claims leadership among Chinese labs on LMArena at substantially lower training cost.

#baidu #ernie #china #lmarena #preview

Tools official 2 src. ~1 min

OpenAI shipped a stable release of Codex CLI v0.128.0 following a series of 0.126.x alphas. The headline feature is persisted /goal workflows: long-running goals are stored via the app-server API, exposed as model tools, support runtime continuation, and have dedicated TUI controls. Permission profiles have been expanded with built-in defaults and sandbox-profile selection directly from the CLI; the --full-auto flag is deprecated in favor of explicit permission profiles. Plugin workflows are improved (marketplace install, remote-bundle cache), and external-agent session import with background import has been added. MultiAgentV2 gained configurable thread caps and wait-time.

Why it matters

Persisted /goal turns Codex CLI from a stateless helper into a platform for long-lived autonomous tasks, competing with Claude Code and Cursor for background agents.

#codex #openai #coding-agent #cli

Research official + media 2 src. ~1 min

A new benchmark has been published for evaluating agents on autonomous scientific literature search and review. It includes two complementary setups: Deep Research (multi-step investigation leading to a specific target paper) and Wide Research (exhaustive collection of publications matching given criteria, scored by IoU). Even the strongest LLM agents reach only 9.39% accuracy on Deep Research and 9.31% IoU on Wide Research.

Why it matters

Closes a methodological gap between general-purpose web agents and the actual work of a researcher; the ~9% figures set a ceiling against which progress on research agents can be measured throughout 2026.

#agents #benchmark #rag #evaluation

Full issue →

April 30, 2026

10 items

Models / LLM official + media 5 src. ~1 min

DeepSeek officially released the V4 lineup in open-source under the MIT license on April 29. It includes DeepSeek-V4-Pro at 1.6T parameters (49B active) and DeepSeek-V4 at 284B (13B active) — both MoE models with native 1M token context. The release claims roughly a 9.5x reduction in memory requirements versus V3.2 and a near-closed gap with frontier closed models on reasoning benchmarks. A defining feature of the release is optimization for Chinese accelerators: Huawei Ascend, Cambricon, Hygon, and Moore Threads completed Day-0 adaptation on release day, with multi-deploy on Ascend 950 expected in the second half of the year.

Why it matters

The first major frontier open-weights release purpose-built for Ascend rather than Nvidia — an infrastructure shift for the Chinese AI stack and a signal that US export restrictions have accelerated the formation of a self-sufficient inference ecosystem.

#deepseek-v4 #open-weights #mit #china #release #huawei-ascend #moe #long-context

Research official + media 2 src. ~1 min

Z.ai unveiled GLM-5V-Turbo, a multimodal foundation model in which visual perception is embedded as a first-class component of reasoning, planning, and tool use rather than bolted on after the fact. The model handles images, video, web pages, and documents; the authors report gains on multimodal coding, visual tool use, and agent tasks while preserving text-only quality. The role of end-to-end verification of agent trajectories during training is emphasized.

Why it matters

One of the most-hyped releases of the week on HF Daily — 2.28k upvotes. A bid for a natively multimodal agent (rather than a VLM with tacked-on tool use) — a direction in which Z.ai is systematically competing with GPT-5 and Gemini.

#multimodal #agents #paper #china #zai-org

Industry official + media 5 src. ~1 min

Yandex disclosed the first partners of the Yandex Commerce Protocol (YCP) — a standard for integrating online stores with AI scenarios in Alice AI, Search, and Yandex Ritm. Going live with sales directly from chat with Alice AI are Stockmann, restore:, pharmacy chains Gorzdrav and 36.6, telecom operator Beeline, the brand The Act, and a number of other retailers; over 200 large online retailers and brands have begun YCP integration, and more than 1,600 additional stores have applied. The technology lets shoppers proceed to checkout directly from the assistant dialog without visiting the merchant's website — Alice AI acts as a transactional AI agent on top of partner catalogs.

Why it matters

YCP is Yandex's bid to be the AI-commerce standard in the Russian-language internet and one of the first large-scale launches of an LLM assistant as a direct sales channel in Russia. If the protocol catches on, it shifts the role of voice and chat assistants from informational to transactional.

#russia #agents #partnership #yandex

Full issue →

April 29, 2026

12 items

Tools official + media 4 src. ~1 min

Anthropic announced the Claude for Creative Work bundle — nine official connectors that let Claude work directly with Adobe Creative Cloud, Blender, Autodesk Fusion, Ableton Live/Push, Affinity by Canva, Resolume, SketchUp, and Splice. In parallel, Anthropic Labs launched a new product, Claude Design, for rapid visual prototyping, and announced education programs with RISD, Ringling, and Goldsmiths.

Why it matters

Anthropic is moving beyond the "code and text assistant" niche into professional creative pipelines — for the first time, a major frontier lab gets an official place inside Adobe and Blender.

#anthropic #claude #mcp #creative-tools #adobe #blender #ableton #autodesk #connectors #claude-design

Industry official + media 6 src. ~1 min

AWS and OpenAI expanded their partnership and launched three offerings on Amazon Bedrock in limited preview: OpenAI's frontier models (GPT-5.5 and GPT-5.4), the Codex agent with CLI/desktop/VS Code support, and Bedrock Managed Agents based on OpenAI. GA is promised within weeks; the models are integrated with IAM, PrivateLink, guardrails, and CloudTrail.

Why it matters

The release came a day after the end of OpenAI's exclusivity with Microsoft and effectively makes Bedrock a second full-fledged distribution channel for OpenAI's frontier models in the enterprise.

#openai #aws #bedrock #gpt-5-5 #gpt-5-4 #codex #managed-agents #partnership #microsoft

Models / LLM official + media 5 src. ~1 min

Mistral AI introduced Mistral Medium 3.5 — a flagship dense model with 128B parameters, 256k context, and switchable reasoning effort. Weights are open under a modified MIT license and available on Hugging Face. In parallel, the company launched remote agents in Vibe (cloud coding sessions with CLI and "teleportation" of a local session into the cloud) and a Work mode in Le Chat for multi-step tasks. Claimed scores: 77.6% on SWE-Bench Verified and 91.4% on τ³-Telecom; API pricing is $1.5/$7.5 per million tokens.

Why it matters

Mistral returns to the frontier with a cheap open-weight model on par with Claude Sonnet 4.5 in coding, while also offering its own analog of Codex/Claude Code — the strongest European release of spring 2026.

#mistral #mistral-medium-3-5 #open-weights #vibe #le-chat #swe-bench #remote-agents #coding-agents

Full issue →

April 28, 2026

11 items

Models / LLM official + media 4 src. ~1 min

On April 27, DeepSeek aggressively cut prices for V4-Pro and V4-Flash (preview from April 24, 1.6T MoE / 49B active, 1M context, optimized for Huawei Ascend, open weights), kicking off another round of the price war in the Chinese market.

#china #pricing #open-weights #deepseek-v4 #release

Image official + media 3 src. ~1 min

On April 27, Adobe launched the global public beta of an AI assistant that orchestrates multi-step creative workflows across 60+ Creative Cloud tools via chat prompts; includes Creative Skills and integration with partner models (GPT Image 2, Veo 3.1, Runway Gen-4.5, ElevenLabs Multilingual v2).

#release #beta #agents #us

Video official 1 src. ~1 min

On April 26, the Sora web and mobile apps were permanently shut down; the API will be discontinued on September 24, 2026.

#deprecation #us

Full issue →

April 27, 2026

3 items

Industry official + media 3 src. ~1 min

End of cloud exclusivity: OpenAI can sell products via AWS/Google Cloud; Microsoft license becomes non-exclusive. Microsoft remains the primary cloud partner and stops paying OpenAI a revenue share. OpenAI continues sharing revenue with Microsoft through 2030; IP license runs through 2032.

#partnership #us

Tools official 2 src. ~1 min

TTS expansion (`/tts latest`, new providers including Azure Speech). Plugin registry moved to cold storage for faster startup. Expanded OpenTelemetry monitoring. Calendar versioning `YYYY.M.D`.

#release #update

Industry media only 3 src. ~1 min

Report of joint chip development; Qualcomm shares up 12% pre-market. Mass production targeted for 2028. Per a report from analyst Ming-Chi Kuo; not officially confirmed by the parties.

#partnership #us

Full issue →

May 1, 2026

Baidu releases ERNIE-5.1-Preview — #1 Chinese model on LMArena

OpenAI Codex CLI 0.128.0 — persisted /goal workflows and expanded permission profiles

AutoResearchBench — a benchmark for autonomous scientific literature search by AI agents

April 30, 2026

DeepSeek V4: official open-source release with Day-0 adaptation for Huawei Ascend

GLM-5V-Turbo: a natively multimodal foundation model for agents

Yandex Commerce Protocol: first retailers launch sales via Alice AI

April 29, 2026

Anthropic launches Claude for Creative Work with connectors to Adobe, Blender, Ableton

OpenAI brings GPT-5.5, Codex, and Managed Agents to Amazon Bedrock

Mistral releases Medium 3.5 — 128B dense, 256k context, open weights

April 28, 2026

DeepSeek V4 — API price cuts

Firefly AI Assistant — Public Beta

Sora — final shutdown

April 27, 2026

Microsoft–OpenAI restructuring

OpenClaw 2026.4.25

Qualcomm + OpenAI + MediaTek — AI processors for smartphones