Daily digest
12 items · ~12 min · Week 2026-W21
Must-read (2)
Google I/O 2026: Gemini 4, Jules V2, Firebase Studio GA, Android XR, and Aluminium OS
Google DeepMindGoogle I/O 2026 opened May 19 at Shoreline Amphitheatre. The keynote announced Gemini 4 with a multi-million-token context window and native multimodal (audio/video) processing, alongside 'Gemini Intelligence' — a proactive ambient AI layer integrated across Android 17, Chrome, and new hardware. Developer highlights: Jules V2 (codename Project Jitro), an outcome-driven coding agent where developers set goals (e.g. 'raise test coverage to 80%') rather than discrete tasks; Firebase Studio going generally available as a cloud-native dev workspace combining Code OSS, no-code prototyping, and Figma integration. Hardware previews: Android XR glasses with Gemini integration, 'Googlebook' laptops, and Aluminium OS — an Android-based desktop platform replacing ChromeOS. Gemini Omni, capable of generating and editing video natively in chat, was also previewed alongside Veo updates.
LongLive-2.0: NVFP4 Parallel Infrastructure for Long Video Generation (NVIDIA, 1,220 HF upvotes)
NVIDIANVIDIA introduces LongLive-2.0, an NVFP4-based (4-bit floating point) parallel infrastructure for long video generation. Key innovations: Balanced Sequence Parallelism for autoregressive training, elimination of ODE initialization dependencies, and W4A4 NVFP4 inference with quantized KV cache and asynchronous streaming VAE decoding. Achieves 2.15× training speedup and 1.84× inference speedup, reaching 45.7 FPS on the 5B model. Code and models are publicly released.
Worth knowing (5)
Anthropic Acquires Stainless, the SDK and MCP Tooling Startup Used by OpenAI and Google
AnthropicAnthropic announced the acquisition of Stainless, a New York-based startup (founded 2022) that built and maintained Anthropic's official SDKs since the earliest API days. The deal is reported at over $300 million. Stainless also built SDKs for OpenAI, Google, and Cloudflare. Anthropic plans to wind down all hosted Stainless products — including its third-party SDK generator — going forward, though existing customers retain full rights to already-generated SDKs. The acquisition is framed as a move to strengthen Claude's agent connectivity via the Model Context Protocol (MCP) ecosystem.
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence (178 HF upvotes)
Peking University / Shanghai Artificial Intelligence LaboratoryCiteVQA evaluates multimodal LLMs not just on answer correctness but also on whether they cite the correct source region within documents. It introduces Strict Attributed Accuracy (SAA), requiring both the answer and its bounding-box citation to be correct. The benchmark covers 1,897 questions across 711 PDFs in seven domains and two languages. Testing 20 MLLMs reveals widespread 'Attribution Hallucination': models frequently produce correct answers while citing wrong passages. Even the strongest model (Gemini-3.1-Pro-Preview) achieves only 76.0% SAA; best open-source model reaches 22.5%.
PhysBrain 1.0: Human Egocentric Video as Robot Training Data for VLA Models (133 HF upvotes)
DeepCyboPhysBrain 1.0 is a vision-language-action model that acquires physical commonsense from large-scale human egocentric video (Ego4D and similar) before robot adaptation, rather than relying solely on expensive robot trajectory data. A schema-driven data engine extracts structured scene meta-information and converts it into physically grounded QA. Multi-model annotation pools (GPT-5, Gemini 3.1 Pro, Qwen3 variants) generate diverse supervision. The resulting priors transfer to robot control via a capability-preserving VLA adapter. PhysBrain 1.0 achieves state-of-the-art on ERQA, PhysBench, SimplerEnv, LIBERO, and RoboCasa benchmarks with particularly strong out-of-domain generalization.
MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes)
Shanghai Jiao Tong UniversityMMSkills introduces a framework for equipping visual AI agents with reusable multimodal procedural knowledge. Each skill package combines a textual procedure with runtime state cards and multi-view keyframes. An agentic trajectory-to-skill generator transforms public interaction trajectories into reusable skills through workflow grouping, procedure induction, visual grounding, and meta-skill-guided auditing. At runtime, a branch-loaded multimodal skill agent inspects visual cards and keyframes, aligns them with the live environment, and distills structured guidance. Experiments on GUI and game-based benchmarks show consistent improvements for both frontier and smaller multimodal agents.
OpenAI Codex v0.131.0: Unified Mention Picker, codex doctor Diagnostics, Python SDK Rename
OpenAIOpenAI Codex v0.131.0 stable (May 18) delivers: a unified `@` mention picker searching files, directories, plugins, and skills in one place; `codex doctor` — a new diagnostic subcommand covering runtime, auth, terminal, network, config, and local state; the Python SDK package renamed to `openai-codex` / `openai_codex` with pinned runtime-generated types and concurrent turn routing; richer TUI session controls including blended token usage display and permissions/approval mode; plugin marketplace CLI commands and version-aware sharing; and remote workflow daemon management. Bug fixes harden Windows sandbox behavior and fix TUI rendering (URL wrapping, light-mode contrast, Shift+Enter in tmux).
For reference (5)
NudgeRL: Strategy-Level Context Nudges for Efficient RLVR Exploration
KAIST AINudgeRL addresses exploration inefficiency in reinforcement learning with verifiable rewards (RLVR). The framework introduces lightweight strategy-level context nudges that induce diverse reasoning trajectories without oracle supervision or expensive rollout scaling. A unified learning objective decomposes rewards into inter- and intra-context components with distillation to transfer learned behaviors back to the base policy. NudgeRL outperforms standard GRPO with up to 8× larger rollout budgets across five math reasoning benchmarks while remaining competitive with oracle-guided methods.
Claude Code v2.1.144: /resume for Background Sessions, Faster MCP Startup, 75s Timeout Fix
AnthropicClaude Code v2.1.144 (May 19) adds /resume support so background sessions started via `claude --bg` or agent view appear alongside interactive ones. The /plugin browse pane now shows plugin last-updated dates; /model changes model for current session only (press `d` to set default for new sessions); SDK/headless MCP startup is up to 2 seconds faster with slow MCP servers. Bug fixes: startup hang of up to 75s when api.anthropic.com was unreachable (now times out after 15s), terminal rendering glitches, and macOS background sessions crashing in Full Disk Access-protected folders.
SST OpenCode v1.15.5: Experimental OpenAI Runtime Path, --replay Session History
SSTSST OpenCode v1.15.5 (May 18) introduces an experimental OpenAI native runtime path (preview), adds `--replay` and `--replay-limit` flags to view recent session history during interactive runs, fixes plugin tools using the `ask` function so tool calls complete correctly, reduces subscription race conditions causing missed /event updates, sorts the v2 session list by most recently updated, and refreshes the TUI prompt layout after pasting content.
OpenClaw v2026.5.18: defineToolPlugin SDK, HTTPS Forward Proxy, Python Debugging Skill
OpenClawOpenClaw v2026.5.18 stable (May 18) adds: a new `defineToolPlugin` API plus `openclaw plugins build`, `validate`, and `init` CLI commands for typed simple tool plugins with auto-generated manifest metadata; HTTPS managed forward-proxy endpoint support with scoped `proxy.tls.caFile` CA trust; a Python debugging skill covering pdb, breakpoint(), post-mortem inspection, and debugpy remote attach; modal dialog surfacing in browser snapshots; and over 100 bug fixes. The stable v2026.5.12 consolidated leaner installs by moving WhatsApp, Slack, and Bedrock provider cones out of the core runtime.
GitHub Copilot CLI v1.0.49: /rubber-duck Critique Command, /chronicle Search, Alpine Linux
GitHub (Microsoft)GitHub Copilot CLI v1.0.49 (May 18) adds: `/rubber-duck` — a command to get an independent critique of the agent's current work without the agent being defensive about its own output; `/chronicle search` to search all session content by keyword or topic; `/memory on|off|show` slash command for persistent memory management; `copilot plugin update --all` to update all plugins simultaneously; Alpine Linux (musl libc) support; improved `postToolUse` hook with additionalContext injected as a system message; and an input prompt that collapses to single line when empty.