Daily digest
14 items · ~14 min · Week 2026-W19
Must-read (2)
xAI Releases Grok 4.3 with 1M Context, 40-60% Price Cuts, and Agentic Benchmark Gains
xAIxAI released Grok 4.3 on May 6 with a 1M-token context window, improved agentic tool calling, and pricing cuts of 37.5% on input and 58.3% on output versus Grok 4.20. The model scores 1,500 ELO on the GDPval-AA agentic benchmark (+321 points from predecessor) and 98% on tau2-Bench Telecom. Priced at $1.25/M input and $2.50/M output.
Anthropic Signs SpaceX Colossus Compute Deal, Doubles Claude Code Rate Limits
AnthropicAnthropic signed a compute partnership with SpaceX granting access to the full Colossus 1 data center — over 220,000 NVIDIA GPUs and 300+ MW — announced at the 'Code with Claude' developer conference in San Francisco on May 6. Immediately, Claude Code's five-hour rate limits were doubled for Pro, Max, Team, and Enterprise plans, peak-hour throttling eliminated for Pro and Max, and API rate limits for Claude Opus models raised substantially (up to 1,500% for Tier 1 input tokens per minute).
Worth knowing (5)
RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks
RLWRLDRLWRLD published the RLDX-1 technical report (arXiv:2605.03269, 68 authors) presenting a robotic VLA policy built on the Multi-Stream Action Transformer (MSAT), integrating modalities via modality-specific streams with cross-modal joint self-attention. A three-stage training pipeline (internet-scale pre-training, embodiment mid-training, task fine-tuning) achieves 86.8% success on ALLEX humanoid tasks vs. ~40% for pi0.5 and GR00T N1.6. Synthetic data augmentation with motion-consistency filtering addresses rare manipulation scenarios.
GitHub Copilot VS Code April Releases: BYOK Model Keys, Browser Tab Sharing, Terminal Write
GitHubGitHub published the consolidated changelog for Copilot in VS Code covering v1.116-v1.119 (April-early May 2026). Highlights: bring-your-own-model-key (BYOK) for Business and Enterprise via OpenRouter, Anthropic, Google, OpenAI, and Microsoft Foundry; agent read/write access to foreground terminals; browser tab sharing for agent interaction with live web content; semantic workspace search across GitHub orgs; code diffs in chat; and an experimental /chronicle feature for querying local chat history.
AWS MCP Server Reaches General Availability with Full API Access and IAM Audit Controls
Amazon Web ServicesAWS released the AWS MCP Server as generally available (US East N. Virginia and EU Frankfurt) on May 6. It exposes all 15,000+ AWS API operations via a single call_aws MCP tool using existing IAM credentials, plus live-fetching documentation tools and a sandboxed run_script tool for Python against AWS services. CloudWatch and CloudTrail provide audit trails; IAM context keys enable distinct permissions for human vs. agent operations.
GitHub MCP Server: Secret Scanning GA and Dependency Scanning Public Preview
GitHubGitHub shipped two MCP Server security features on May 5: secret scanning reached GA (respecting existing push protection customization), and dependency scanning entered public preview, enabling agents to scan code changes for vulnerable dependencies using the GitHub Advisory Database and Dependabot CLI. Both require GitHub Advanced Security or GitHub Secret Protection.
MiniMax Hailuo 2.3 Launches with Media Agent and 50% Cheaper Batch Video Generation
MiniMaxMiniMax released Hailuo 2.3 on May 7, cutting batch video generation costs by up to 50% while maintaining base pricing. New capabilities include improved micro-expression realism, better physics-heavy motion handling, multi-style support (anime, ink-wash, game-CG), and enhanced motion command response. Simultaneously, MiniMax evolved Hailuo Video Agent into a full Media Agent for multi-modal creation, rolled out globally across web, mobile, and API.
For reference (7)
Google DeepMind Takes Minority Stake in CCP Games for Multi-Agent Research in EVE Online
Google DeepMindGoogle DeepMind announced a partnership with CCP Games (EVE Online) on May 6, taking a minority stake to research player-driven systems and train AI on the complex persistent multiplayer environment. Further research agenda details are expected at EVE Fanfest 2026; the collaboration focuses on emergent behavior in long-horizon multi-agent simulations.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Shanghai Jiao Tong UniversityResearchers from SJTU introduced LongSeeker (arXiv:2605.05191) addressing context explosion in long-horizon search agents via Context-ReAct: five adaptive operations (Skip, Compress, Rollback, Snippet, Delete) that dynamically reshape working memory based on relevance. LongSeeker, fine-tuned from Qwen3-30B-A3B, achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH.
Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic
Sergey Rodionov submitted a paper (arXiv:2605.05138, May 6) presenting a coding-agent approach to ARC-AGI-3 where the agent maintains an executable Python world model, validates it against prior observations, and applies a simplicity bias via refactoring. Tested across 25 public ARC-AGI-3 games without game-specific logic: 7 games fully solved, 6 games above 75% RHAE, mean RHAE of 32.58%.
Claude Code v2.1.132: SIGINT Fix, MCP Memory Leak, Bedrock/Vertex Prompt Caching Fix
AnthropicClaude Code v2.1.132 (May 6) adds CLAUDE_CODE_SESSION_ID env var for Bash subprocesses and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN opt-out flag. Critical fixes: external SIGINT handling for IDE stop buttons, blank screen after laptop sleep or Ctrl+Z/fg, unbounded memory growth from non-protocol MCP server output, and Bedrock/Vertex 400 errors with prompt caching. Companion v2.1.131 fixed VS Code extension activation on Windows and Mantle endpoint authentication.
OpenCode v1.14.40: Remote .well-known Config and Signed Reasoning Block Fixes
SSTOpenCode v1.14.40 (May 7) adds support for .well-known/opencode configs pointing to remote files, fixes signed reasoning blocks, and applies CORS headers before authentication. Bug fixes cover network option errors and invalid surrogate character sanitization. This follows v1.14.37-39 (May 5) which introduced session warping between workspaces, improved v2 session rendering, proper subtask cancellation, and desktop proxy env-var support.
Cursor 3.3: Context Usage Breakdown for Agent Diagnostics
CursorCursor v3.3 (May 6) introduces a context usage breakdown panel: clicking an agent's context ring in the Agents Window shows a proportional breakdown of context consumed by rules, skills, MCP servers, subagents, and other components, helping developers pinpoint unexpectedly heavy context consumers.
VK Video AI Character Recognition Boosts Watch Time 9% via Cascade Face Detection
VK AIVK published results from its AI character recognition system in VK Video recommendations (May 5). Two ML models run in cascade: the first scans at one frame per second to detect faces, the second identifies popular personalities among those detected. Recognized characters feed into VK Video's recommendation engine, surfacing content featuring users' preferred personalities. Since deployment, average watch time for 'Watch Next' videos with recurring characters increased by 9%.