Daily digest
13 items · ~13 min · Week 2026-W19
Must-read (1)
OpenAI Releases GPT-5.5 Instant as New Default ChatGPT Model
OpenAIOpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as the default model for all ChatGPT users, reporting 52.5% fewer hallucinated claims and 37.3% fewer factual errors on hard prompts, while cutting response length by ~30%. The update also introduces personalization that draws on past conversations, uploaded files, and connected Gmail, with memory sources visible and editable by users.
Worth knowing (6)
ElevenLabs Surpasses $500M ARR, Adds BlackRock and Nvidia to Series D
ElevenLabsElevenLabs disclosed that its annualized recurring revenue crossed $500 million in Q1 2026, up from $350 million at year-end 2025. The company revealed the third close of its Series D fundraise (originally announced in February at an $11B valuation), adding BlackRock, Wellington, Nvidia, Salesforce Ventures, Jamie Foxx, Eva Longoria, and Squid Game creator Hwang Dong-hyuk as new investors, bringing total Series D proceeds above $550 million.
OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x
OpenAIOpenAI published a post-mortem tracing how GPT-5.1 through GPT-5.4 developed an anomalous tendency to use goblin and gremlin metaphors. The root cause was a 'Nerdy personality' RLHF training condition where creature metaphors received disproportionately high rewards; the behavior then leaked proportionally into non-Nerdy outputs via RL generalization. The Nerdy personality accounted for only 2.5% of responses but 66.7% of all goblin mentions, demonstrating that RL-learned behaviors do not stay neatly scoped to the conditions that produced them.
Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs
The paper introduces Ctx2Skill, a self-improving framework for autonomous context-skill discovery in language models. A multi-agent self-play loop pits a Challenger (generating probing tasks) against a Reasoner (solving them using evolving skills), with a Judge providing feedback and a Cross-time Replay mechanism preventing skill degradation. Tested on four context-learning benchmarks, Ctx2Skill consistently improves performance across different LLM backbones without any human-authored skills.
Anthropic Launches Ten Financial Services AI Agent Templates with Microsoft 365 Integration
AnthropicAnthropic released ten pre-built AI agent templates for financial services tasks — covering pitchbooks, KYC screening, earnings review, month-end close, and more — alongside general availability of Claude add-ins for Microsoft Excel, PowerPoint, and Word. The announcement coincided with Anthropic's financial services briefing event and highlighted Claude Opus 4.7's top score on the Vals AI Finance Agent benchmark. Production deployments at JPMorganChase, Goldman Sachs, and Citi were confirmed.
Roo Code Announces Shutdown on May 15, Pivoting to Roomote Cloud Agent
Roo CodeRoo Code, a VS Code extension fork of Cline with 3 million installs and 23K GitHub stars, announced it will shut down its extension, cloud, and router products on May 15, 2026. The team cited a belief that IDEs are not the future of coding and is redirecting resources to Roomote, a cloud-based coding agent that runs tasks end-to-end across Slack, GitHub, and Linear. Cline is recommended as the open-source successor for existing users.
SGLang v0.5.11: Speculative Decoding V2 as Default and Eight New Model Architectures
SGLang v0.5.11 switches to CUDA 13 + PyTorch 2.11 as its default baseline and enables Speculative Decoding V2 with overlap scheduling by default, reducing per-step CPU cost. The release adds support for eight new model architectures including Gemma 4, GLM-5.1, Qwen3.6, and Kimi-K2.6, and extends LoRA support to frontier-scale MLA-based MoE models such as DeepSeek-V3.
For reference (6)
HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL
HeavySkill reframes 'heavy thinking' in LLMs not as an external orchestration artifact but as a learnable, internalized skill consisting of two stages: parallel reasoning followed by summarization. The authors show via reinforcement learning that this skill can be deepened and broadened, with empirical results demonstrating consistent improvements over Best-of-N strategies.
OpenCode v1.14.36–v1.14.39: Cascading Task Cancellation and Workspace Warping
SSTSST's OpenCode shipped four releases (v1.14.36–v1.14.39) on May 5–6, 2026. Key additions: cascading task cancellation propagates to all child subtask sessions; sessions can now be warped into another workspace without restarting; HTTP_PROXY environment variable is honored in the desktop app; system CA certificates are trusted for HTTPS connections, resolving enterprise TLS interception issues.
OpenClaw 2026.5.4: Google Meet Voice Bridge with Gemini and Backpressure-Aware Audio
OpenClaw released version 2026.5.4 on May 5, 2026, adding Twilio dial-in integration with a real-time Gemini voice bridge and paced audio streaming with backpressure-aware buffering for Google Meet calls. The release also includes a new file transfer plugin with binary file operations and per-node path policies, and fixes a Windows loopback binding issue that was blocking localhost HTTP requests.
vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11
vLLM v0.20.1, released May 4, 2026, is a patch release stabilizing DeepSeek V4 on the new CUDA 13 + PyTorch 2.11 baseline established in v0.20.0. Fixes include a persistent topk cooperative deadlock, NVFP4 MoE kernel support for RTX Blackwell workstation GPUs, and multi-stream pre-attention GEMM performance improvements. The v0.20.x series also added HuggingFace Transformers v5 support.
Ollama v0.23.1: Gemma 4 MTP Speculative Decoding Delivers 2× Speed on Apple Silicon
Ollama v0.23.1, released May 5, 2026, introduces Gemma 4 MTP (Multi-Token Processing) speculative decoding for the MLX runner on Apple Silicon, delivering over 2× speed improvement for the Gemma 4 31B model on coding tasks. The release also includes MLX and MLX-C threading fixes and a Go 1.26 language bump.
Jama Connect 9.35 Launches First MCP Server for Engineering Requirements Management
Jama SoftwareJama Software launched an official MCP server for Jama Connect 9.35 on May 4, 2026, making it the first engineering management platform to offer native MCP server support. Engineers can use Claude, Codex, Cursor, GitHub Copilot, and other AI-enabled environments to query and iterate on requirements, while existing permissions, lifecycle workflows, and audit requirements are enforced automatically.