Daily digest

May 7, 2026

14 items · ~14 min · Week 2026-W19

Must-read (2)

Models / LLM official + media 3 src. ~1 min

xAI released Grok 4.3 on May 6 with a 1M-token context window, improved agentic tool calling, and pricing cuts of 37.5% on input and 58.3% on output versus Grok 4.20. The model scores 1,500 ELO on the GDPval-AA agentic benchmark (+321 points from predecessor) and 98% on tau2-Bench Telecom. Priced at $1.25/M input and $2.50/M output.

Why it matters

A 321-point ELO leap on agentic benchmarks combined with sub-$2 output pricing makes Grok 4.3 a direct competitor to GPT-5.5 and Gemini for enterprise agentic workflows at meaningfully lower cost.

#pricing #benchmark #agents #inference #context-window #grok

Tools official + media 4 src. ~1 min

Anthropic signed a compute partnership with SpaceX granting access to the full Colossus 1 data center — over 220,000 NVIDIA GPUs and 300+ MW — announced at the 'Code with Claude' developer conference in San Francisco on May 6. Immediately, Claude Code's five-hour rate limits were doubled for Pro, Max, Team, and Enterprise plans, peak-hour throttling eliminated for Pro and Max, and API rate limits for Claude Opus models raised substantially (up to 1,500% for Tier 1 input tokens per minute).

Why it matters

Anthropic has repeatedly cited compute scarcity as the primary constraint on Claude Code limits; the SpaceX deal removes that ceiling for developer plans and signals aggressive infrastructure expansion ahead of a reported June IPO.

#claude-code #anthropic #enterprise #compute #rate-limits #infrastructure #spacex

Worth knowing (5)

Research official 1 src. ~1 min

RLWRLD published the RLDX-1 technical report (arXiv:2605.03269, 68 authors) presenting a robotic VLA policy built on the Multi-Stream Action Transformer (MSAT), integrating modalities via modality-specific streams with cross-modal joint self-attention. A three-stage training pipeline (internet-scale pre-training, embodiment mid-training, task fine-tuning) achieves 86.8% success on ALLEX humanoid tasks vs. ~40% for pi0.5 and GR00T N1.6. Synthetic data augmentation with motion-consistency filtering addresses rare manipulation scenarios.

Why it matters

More than doubling success rates over frontier VLA competitors on humanoid tasks is a substantial result; RLWRLD's open-source aspirations (previewed at GTC 2026) could make this approach broadly accessible to the robotics research community.

#robotics #embodied-ai #paper #rl #multimodal

Tools official 1 src. ~1 min

GitHub published the consolidated changelog for Copilot in VS Code covering v1.116-v1.119 (April-early May 2026). Highlights: bring-your-own-model-key (BYOK) for Business and Enterprise via OpenRouter, Anthropic, Google, OpenAI, and Microsoft Foundry; agent read/write access to foreground terminals; browser tab sharing for agent interaction with live web content; semantic workspace search across GitHub orgs; code diffs in chat; and an experimental /chronicle feature for querying local chat history.

Why it matters

BYOK unlocks any major AI provider for enterprise Copilot users; browser tab sharing and terminal write access represent a substantial push toward full computer-use agent capability within VS Code.

#coding-agent #vs-code #release #agents #enterprise

Tools official 1 src. ~1 min

AWS released the AWS MCP Server as generally available (US East N. Virginia and EU Frankfurt) on May 6. It exposes all 15,000+ AWS API operations via a single call_aws MCP tool using existing IAM credentials, plus live-fetching documentation tools and a sandboxed run_script tool for Python against AWS services. CloudWatch and CloudTrail provide audit trails; IAM context keys enable distinct permissions for human vs. agent operations.

Why it matters

GA status enables enterprise teams to deploy AI agents that autonomously manage AWS infrastructure with production-grade auditability and fine-grained IAM controls, lowering the barrier to agentic cloud operations.

#mcp #ga #enterprise #agents #cloud #aws

Tools official 2 src. ~1 min

GitHub shipped two MCP Server security features on May 5: secret scanning reached GA (respecting existing push protection customization), and dependency scanning entered public preview, enabling agents to scan code changes for vulnerable dependencies using the GitHub Advisory Database and Dependabot CLI. Both require GitHub Advanced Security or GitHub Secret Protection.

Why it matters

Bringing secret and dependency scanning into the MCP tool surface means AI coding agents can enforce security policies before code lands in PRs, shifting left from post-commit CI to within the agent workflow.

#mcp #ga #open-source #security #agents

Video official + media 3 src. ~1 min

MiniMax released Hailuo 2.3 on May 7, cutting batch video generation costs by up to 50% while maintaining base pricing. New capabilities include improved micro-expression realism, better physics-heavy motion handling, multi-style support (anime, ink-wash, game-CG), and enhanced motion command response. Simultaneously, MiniMax evolved Hailuo Video Agent into a full Media Agent for multi-modal creation, rolled out globally across web, mobile, and API.

Why it matters

50% cheaper batch generation directly undercuts rivals and makes high-volume video workflows more accessible; the pivot to a multi-modal Media Agent signals MiniMax's ambition to own the full content-creation stack beyond video.

#video-generation #pricing #update #release #agents #china

For reference (7)

Industry media only 2 src. ~1 min

Google DeepMind announced a partnership with CCP Games (EVE Online) on May 6, taking a minority stake to research player-driven systems and train AI on the complex persistent multiplayer environment. Further research agenda details are expected at EVE Fanfest 2026; the collaboration focuses on emergent behavior in long-horizon multi-agent simulations.

Why it matters

EVE Online's player-driven economy and long-horizon multi-agent dynamics are uniquely suited for studying emergent behavior at scale — DeepMind gains a high-fidelity real-world simulation that no controlled lab setting can replicate.

#partnership #multi-agent #research #simulation #gaming #deepmind

Research official 1 src. ~1 min

Researchers from SJTU introduced LongSeeker (arXiv:2605.05191) addressing context explosion in long-horizon search agents via Context-ReAct: five adaptive operations (Skip, Compress, Rollback, Snippet, Delete) that dynamically reshape working memory based on relevance. LongSeeker, fine-tuned from Qwen3-30B-A3B, achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH.

Why it matters

Active working-memory shaping is shown to outperform accumulating all trajectory data for long-horizon agents, providing a benchmark-validated approach to a core agent reliability bottleneck.

#agents #reasoning #paper #inference #search

Research official 1 src. ~1 min

Sergey Rodionov submitted a paper (arXiv:2605.05138, May 6) presenting a coding-agent approach to ARC-AGI-3 where the agent maintains an executable Python world model, validates it against prior observations, and applies a simplicity bias via refactoring. Tested across 25 public ARC-AGI-3 games without game-specific logic: 7 games fully solved, 6 games above 75% RHAE, mean RHAE of 32.58%.

Why it matters

ARC-AGI-3 is a new and significantly harder generalization benchmark; this establishes a game-general baseline and provides evidence that verifier-driven executable world models are a viable path, contributing to ongoing debates about symbolic vs. neural reasoning approaches.

#reasoning #coding-agent #paper #benchmark #agents

Tools official 1 src. ~1 min

Claude Code v2.1.132 (May 6) adds CLAUDE_CODE_SESSION_ID env var for Bash subprocesses and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN opt-out flag. Critical fixes: external SIGINT handling for IDE stop buttons, blank screen after laptop sleep or Ctrl+Z/fg, unbounded memory growth from non-protocol MCP server output, and Bedrock/Vertex 400 errors with prompt caching. Companion v2.1.131 fixed VS Code extension activation on Windows and Mantle endpoint authentication.

Why it matters

The MCP memory-growth fix and Bedrock/Vertex prompt-caching fix directly unblock enterprise deployments; the SIGINT and fullscreen fixes resolve long-standing terminal workflow pain points.

#claude-code #release #mcp #cli #bug-fix

Tools official 2 src. ~1 min

OpenCode v1.14.40 (May 7) adds support for .well-known/opencode configs pointing to remote files, fixes signed reasoning blocks, and applies CORS headers before authentication. Bug fixes cover network option errors and invalid surrogate character sanitization. This follows v1.14.37-39 (May 5) which introduced session warping between workspaces, improved v2 session rendering, proper subtask cancellation, and desktop proxy env-var support.

Why it matters

Remote .well-known config enables enterprise teams to centrally distribute OpenCode configuration without manual client updates, a key workflow improvement for organizations deploying OpenCode at scale.

#opencode #coding-agent #release #mcp

Tools official 1 src. ~1 min

Cursor v3.3 (May 6) introduces a context usage breakdown panel: clicking an agent's context ring in the Agents Window shows a proportional breakdown of context consumed by rules, skills, MCP servers, subagents, and other components, helping developers pinpoint unexpectedly heavy context consumers.

Why it matters

Context exhaustion is a top failure mode in multi-repo, multi-MCP agent sessions; granular visibility helps developers avoid hitting limits and reduce token costs.

#coding-agent #release #ide #agents

Tools official 2 src. ~1 min

VK published results from its AI character recognition system in VK Video recommendations (May 5). Two ML models run in cascade: the first scans at one frame per second to detect faces, the second identifies popular personalities among those detected. Recognized characters feed into VK Video's recommendation engine, surfacing content featuring users' preferred personalities. Since deployment, average watch time for 'Watch Next' videos with recurring characters increased by 9%.

Why it matters

Character-level video understanding is a meaningful step beyond topic-based recommendations; VK's disclosure of both the architecture and a measured 9% watch-time lift provides a concrete benchmark for applied ML in Russian-language video platforms.

#russia #inference #personalization #update #computer-vision

May 7, 2026

Must-read (2)

xAI Releases Grok 4.3 with 1M Context, 40-60% Price Cuts, and Agentic Benchmark Gains

Anthropic Signs SpaceX Colossus Compute Deal, Doubles Claude Code Rate Limits

Worth knowing (5)

RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks

GitHub Copilot VS Code April Releases: BYOK Model Keys, Browser Tab Sharing, Terminal Write

AWS MCP Server Reaches General Availability with Full API Access and IAM Audit Controls

GitHub MCP Server: Secret Scanning GA and Dependency Scanning Public Preview

MiniMax Hailuo 2.3 Launches with Media Agent and 50% Cheaper Batch Video Generation

Google DeepMind Takes Minority Stake in CCP Games for Multi-Agent Research in EVE Online

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic

Claude Code v2.1.132: SIGINT Fix, MCP Memory Leak, Bedrock/Vertex Prompt Caching Fix

OpenCode v1.14.40: Remote .well-known Config and Signed Reasoning Block Fixes

Cursor 3.3: Context Usage Breakdown for Agent Diagnostics

VK Video AI Character Recognition Boosts Watch Time 9% via Cascade Face Detection