Daily digest

14 items · ~14 min · Week 2026-W19

Must-read (2)

xAI Releases Grok 4.3 with 1M Context, 40-60% Price Cuts, and Agentic Benchmark Gains

xAI
Models / LLM official + media 3 src. ~1 min

xAI released Grok 4.3 on May 6 with a 1M-token context window, improved agentic tool calling, and pricing cuts of 37.5% on input and 58.3% on output versus Grok 4.20. The model scores 1,500 ELO on the GDPval-AA agentic benchmark (+321 points from predecessor) and 98% on tau2-Bench Telecom. Priced at $1.25/M input and $2.50/M output.

Why it matters
A 321-point ELO leap on agentic benchmarks combined with sub-$2 output pricing makes Grok 4.3 a direct competitor to GPT-5.5 and Gemini for enterprise agentic workflows at meaningfully lower cost.

Anthropic Signs SpaceX Colossus Compute Deal, Doubles Claude Code Rate Limits

Anthropic
Tools official + media 4 src. ~1 min

Anthropic signed a compute partnership with SpaceX granting access to the full Colossus 1 data center — over 220,000 NVIDIA GPUs and 300+ MW — announced at the 'Code with Claude' developer conference in San Francisco on May 6. Immediately, Claude Code's five-hour rate limits were doubled for Pro, Max, Team, and Enterprise plans, peak-hour throttling eliminated for Pro and Max, and API rate limits for Claude Opus models raised substantially (up to 1,500% for Tier 1 input tokens per minute).

Why it matters
Anthropic has repeatedly cited compute scarcity as the primary constraint on Claude Code limits; the SpaceX deal removes that ceiling for developer plans and signals aggressive infrastructure expansion ahead of a reported June IPO.

Worth knowing (5)

RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks

RLWRLD
Research official 1 src. ~1 min

RLWRLD published the RLDX-1 technical report (arXiv:2605.03269, 68 authors) presenting a robotic VLA policy built on the Multi-Stream Action Transformer (MSAT), integrating modalities via modality-specific streams with cross-modal joint self-attention. A three-stage training pipeline (internet-scale pre-training, embodiment mid-training, task fine-tuning) achieves 86.8% success on ALLEX humanoid tasks vs. ~40% for pi0.5 and GR00T N1.6. Synthetic data augmentation with motion-consistency filtering addresses rare manipulation scenarios.

Why it matters
More than doubling success rates over frontier VLA competitors on humanoid tasks is a substantial result; RLWRLD's open-source aspirations (previewed at GTC 2026) could make this approach broadly accessible to the robotics research community.

GitHub Copilot VS Code April Releases: BYOK Model Keys, Browser Tab Sharing, Terminal Write

GitHub
Tools official 1 src. ~1 min

GitHub published the consolidated changelog for Copilot in VS Code covering v1.116-v1.119 (April-early May 2026). Highlights: bring-your-own-model-key (BYOK) for Business and Enterprise via OpenRouter, Anthropic, Google, OpenAI, and Microsoft Foundry; agent read/write access to foreground terminals; browser tab sharing for agent interaction with live web content; semantic workspace search across GitHub orgs; code diffs in chat; and an experimental /chronicle feature for querying local chat history.

Why it matters
BYOK unlocks any major AI provider for enterprise Copilot users; browser tab sharing and terminal write access represent a substantial push toward full computer-use agent capability within VS Code.

AWS MCP Server Reaches General Availability with Full API Access and IAM Audit Controls

Amazon Web Services
Tools official 1 src. ~1 min

AWS released the AWS MCP Server as generally available (US East N. Virginia and EU Frankfurt) on May 6. It exposes all 15,000+ AWS API operations via a single call_aws MCP tool using existing IAM credentials, plus live-fetching documentation tools and a sandboxed run_script tool for Python against AWS services. CloudWatch and CloudTrail provide audit trails; IAM context keys enable distinct permissions for human vs. agent operations.

Why it matters
GA status enables enterprise teams to deploy AI agents that autonomously manage AWS infrastructure with production-grade auditability and fine-grained IAM controls, lowering the barrier to agentic cloud operations.

GitHub MCP Server: Secret Scanning GA and Dependency Scanning Public Preview

GitHub
Tools official 2 src. ~1 min

GitHub shipped two MCP Server security features on May 5: secret scanning reached GA (respecting existing push protection customization), and dependency scanning entered public preview, enabling agents to scan code changes for vulnerable dependencies using the GitHub Advisory Database and Dependabot CLI. Both require GitHub Advanced Security or GitHub Secret Protection.

Why it matters
Bringing secret and dependency scanning into the MCP tool surface means AI coding agents can enforce security policies before code lands in PRs, shifting left from post-commit CI to within the agent workflow.

MiniMax Hailuo 2.3 Launches with Media Agent and 50% Cheaper Batch Video Generation

MiniMax
Video official + media 3 src. ~1 min

MiniMax released Hailuo 2.3 on May 7, cutting batch video generation costs by up to 50% while maintaining base pricing. New capabilities include improved micro-expression realism, better physics-heavy motion handling, multi-style support (anime, ink-wash, game-CG), and enhanced motion command response. Simultaneously, MiniMax evolved Hailuo Video Agent into a full Media Agent for multi-modal creation, rolled out globally across web, mobile, and API.

Why it matters
50% cheaper batch generation directly undercuts rivals and makes high-volume video workflows more accessible; the pivot to a multi-modal Media Agent signals MiniMax's ambition to own the full content-creation stack beyond video.
For reference (7)

Google DeepMind Takes Minority Stake in CCP Games for Multi-Agent Research in EVE Online

Google DeepMind
Industry media only 2 src. ~1 min

Google DeepMind announced a partnership with CCP Games (EVE Online) on May 6, taking a minority stake to research player-driven systems and train AI on the complex persistent multiplayer environment. Further research agenda details are expected at EVE Fanfest 2026; the collaboration focuses on emergent behavior in long-horizon multi-agent simulations.

Why it matters
EVE Online's player-driven economy and long-horizon multi-agent dynamics are uniquely suited for studying emergent behavior at scale — DeepMind gains a high-fidelity real-world simulation that no controlled lab setting can replicate.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Shanghai Jiao Tong University
Research official 1 src. ~1 min

Researchers from SJTU introduced LongSeeker (arXiv:2605.05191) addressing context explosion in long-horizon search agents via Context-ReAct: five adaptive operations (Skip, Compress, Rollback, Snippet, Delete) that dynamically reshape working memory based on relevance. LongSeeker, fine-tuned from Qwen3-30B-A3B, achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH.

Why it matters
Active working-memory shaping is shown to outperform accumulating all trajectory data for long-horizon agents, providing a benchmark-validated approach to a core agent reliability bottleneck.

Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic

Research official 1 src. ~1 min

Sergey Rodionov submitted a paper (arXiv:2605.05138, May 6) presenting a coding-agent approach to ARC-AGI-3 where the agent maintains an executable Python world model, validates it against prior observations, and applies a simplicity bias via refactoring. Tested across 25 public ARC-AGI-3 games without game-specific logic: 7 games fully solved, 6 games above 75% RHAE, mean RHAE of 32.58%.

Why it matters
ARC-AGI-3 is a new and significantly harder generalization benchmark; this establishes a game-general baseline and provides evidence that verifier-driven executable world models are a viable path, contributing to ongoing debates about symbolic vs. neural reasoning approaches.

Claude Code v2.1.132: SIGINT Fix, MCP Memory Leak, Bedrock/Vertex Prompt Caching Fix

Anthropic
Tools official 1 src. ~1 min

Claude Code v2.1.132 (May 6) adds CLAUDE_CODE_SESSION_ID env var for Bash subprocesses and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN opt-out flag. Critical fixes: external SIGINT handling for IDE stop buttons, blank screen after laptop sleep or Ctrl+Z/fg, unbounded memory growth from non-protocol MCP server output, and Bedrock/Vertex 400 errors with prompt caching. Companion v2.1.131 fixed VS Code extension activation on Windows and Mantle endpoint authentication.

Why it matters
The MCP memory-growth fix and Bedrock/Vertex prompt-caching fix directly unblock enterprise deployments; the SIGINT and fullscreen fixes resolve long-standing terminal workflow pain points.

OpenCode v1.14.40: Remote .well-known Config and Signed Reasoning Block Fixes

SST
Tools official 2 src. ~1 min

OpenCode v1.14.40 (May 7) adds support for .well-known/opencode configs pointing to remote files, fixes signed reasoning blocks, and applies CORS headers before authentication. Bug fixes cover network option errors and invalid surrogate character sanitization. This follows v1.14.37-39 (May 5) which introduced session warping between workspaces, improved v2 session rendering, proper subtask cancellation, and desktop proxy env-var support.

Why it matters
Remote .well-known config enables enterprise teams to centrally distribute OpenCode configuration without manual client updates, a key workflow improvement for organizations deploying OpenCode at scale.

Cursor 3.3: Context Usage Breakdown for Agent Diagnostics

Cursor
Tools official 1 src. ~1 min

Cursor v3.3 (May 6) introduces a context usage breakdown panel: clicking an agent's context ring in the Agents Window shows a proportional breakdown of context consumed by rules, skills, MCP servers, subagents, and other components, helping developers pinpoint unexpectedly heavy context consumers.

Why it matters
Context exhaustion is a top failure mode in multi-repo, multi-MCP agent sessions; granular visibility helps developers avoid hitting limits and reduce token costs.

VK Video AI Character Recognition Boosts Watch Time 9% via Cascade Face Detection

VK AI
Tools official 2 src. ~1 min

VK published results from its AI character recognition system in VK Video recommendations (May 5). Two ML models run in cascade: the first scans at one frame per second to detect faces, the second identifies popular personalities among those detected. Recognized characters feed into VK Video's recommendation engine, surfacing content featuring users' preferred personalities. Since deployment, average watch time for 'Watch Next' videos with recurring characters increased by 9%.

Why it matters
Character-level video understanding is a meaningful step beyond topic-based recommendations; VK's disclosure of both the architecture and a measured 9% watch-time lift provides a concrete benchmark for applied ML in Russian-language video platforms.