Daily digest
10 items · ~10 min · Week 2026-W24
Worth knowing (4)
ElevenLabs Music v2: Mid-Track Genre Switching, Inpainting, and Commercial Clearance
ElevenLabsElevenLabs released Music v2 on May 26, 2026, introducing mid-track genre transitions (e.g., opera to heavy metal within a single composition), section-by-section structural building (intro, verse, chorus, bridge, outro), audio inpainting to regenerate specific segments without affecting the rest, non-musical sound effect embedding within tracks, and sustained dense lyrical delivery including fast rap. Trained exclusively on licensed data, the model is cleared for commercial use with no sync fees. Pricing was cut up to 50% for ElevenAPI and up to 40% for ElevenCreative self-serve customers.
Claude Code v2.1.169: Safe Mode Flag, /cd Command, and disableBundledSkills Setting
AnthropicVersion 2.1.169 (June 8, 2026) adds a `--safe-mode` flag (and `CLAUDE_CODE_SAFE_MODE` env var) that disables all customizations — CLAUDE.md, plugins, skills, hooks, MCP servers — for clean troubleshooting. The `/cd` command allows moving a session to a new working directory without breaking prompt cache. A `disableBundledSkills` setting hides bundled skills and built-in slash commands from the model. Fixes include Up/Down arrow navigation in long input lines, enterprise MCP policy enforcement bugs, a macOS UI stall for claude.ai-authenticated users, and `claude -p` slowness on Windows (regression from 2.1.161). Previous v2.1.166 (June 6) added `fallbackModel` support for up to three fallback models, glob pattern support in deny rules, and hardened cross-session messaging security.
Cursor 3.7: Canvas Design Mode, Context Usage Reports, and SDK Nested Subagents
CursorCursor 3.7 (June 4–5, 2026) introduces Design Mode in canvases: developers click, draw, or describe UI changes by voice directly over rendered components to guide edits without writing descriptions. Multi-select and voice input work while an agent is mid-run. A new interactive context usage report in canvases shows token distribution across system prompt, tool definitions, rules, skills, and more. The SDK update adds custom tools via `local.customTools`, auto-review routing for tool calls, JSONL and custom store persistence options, and nested subagents that can spawn their own subagents at any depth. Enterprise customers gained multi-team organization management with separate security, governance, and budget controls (GA as of June 3).
vLLM Semantic Router v0.3 Themis: Stateful Production Routing with Session-Aware Agentic Routing
vLLM Semantic Router v0.3 (codename Themis), released June 5, 2026, transforms routing from a classification tool into a stateful, observable production system. Key additions: a unified v0.3 configuration format eliminating dialect fragmentation; signal enrichment extracting evidence from 15+ signal families (auth, safety, conversation shape, tool-loop detection); Session-Aware Agentic Routing (SAAR) combining router-owned session memory, safety locks during tool loops, provider-state portability checks, and replayable diagnostics; a revamped operator dashboard; and an Intel OpenVINO binding for C++/Go integration. The release represents 350+ commits since v0.2.0. The router ranked #1 on RouterArena with a 75.4 weighted Arena Score and adds native Anthropic `/v1/messages` protocol support alongside OpenAI compatibility.
For reference (6)
Echo-Memory: Controlled Study of Memory Mechanisms in Action-Conditioned Video World Models
Microsoft ResearchEcho-Memory (arXiv:2606.09803) presents a controlled framework for isolating and comparing memory mechanisms in action-conditioned video generation models. By fixing the backbone and varying only memory components, the paper disentangles four axes: capacity, compression, read-out strategy, and recurrence. Key findings: raw context is stronger than expected; aggressive compression hurts fidelity; block-wise state-space recurrence wins on open-domain return tasks; and replay quality is not a reliable proxy for true scene memory.
SWE-Explore: Benchmarking Repository Exploration as the Binding Constraint in Coding Agents
Shanghai Jiao Tong UniversitySWE-Explore (arXiv:2606.07297) introduces a benchmark of 848 GitHub issues across 10 programming languages and 203 repositories to evaluate repository exploration — the step before patch generation where an agent must locate relevant code. Classical retrievers (BM25, TF-IDF) perform near random baseline; agentic explorers reach >65% file-level hit rates but only ~15% line-level recall. GPT-5 vs. Gemini swaps shift performance magnitude but not the recall bottleneck, suggesting the limit is exploration strategy rather than raw model capability.
On the Geometry of On-Policy Distillation: A Training Paradigm Distinct from SFT and RLVR
Hong Kong University of Science and TechnologyThis paper (arXiv:2606.07082) characterizes on-policy distillation (OPD) as a distinct training paradigm by analyzing its parameter-space geometry. OPD leaves 51.6% of weights unchanged (between SFT at 8.1% and RLVR at 77.2%), avoids principal directions more strongly than SFT, and exhibits 'subspace locking' — cumulative updates rapidly enter a stable low-dimensional channel. Constraining training to this early-formed subspace preserves performance, and the subspace is robust to token sparsification and off-policy rollouts but changes when objectives are mixed.
Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight
Rutgers UniversityProposes Progressive On-Policy Critique Distillation (OPCD), where a weak model acts as a critic providing revision directions rather than binary judgments (arXiv:2606.00424). The key insight is that weak critics only need to offer non-misleading improvement directions — not correct final answers — enabling strong models to leverage their own knowledge for self-improvement. The method filters high-quality critiques and distills critic-guided behaviors into the strong model through adaptive self-teaching. Shows improvements on reasoning and alignment benchmarks across training iterations.
OpenAI Codex CLI v0.138.0: Desktop Handoff, Structured Plugin Output, and Account Token Visibility
OpenAIVersion 0.138.0 (June 8, 2026) adds desktop handoff for the `/app` command on macOS and Windows, local image file path exposure to models for follow-up edits, enhanced reasoning effort selection with fallback shortcuts for terminals missing Alt bindings, account token usage visibility and v2 personal access token support, and structured JSON output for plugin automation (`codex plugin list --json`). TUI streaming optimizations eliminate blank spacing artifacts and workspace instruction loading is improved for remote and symlinked environments. An alpha v0.139.0 build was also cut on June 9.
Ollama v0.30.7: Hermes Desktop Support, Gemma 4 QAT, and Nemotron-3-Ultra
OllamaOllama v0.30.7 (June 7, 2026) adds native Windows support for Hermes Desktop and aligns OpenAI-compatible API model lists with available tags. The v0.30.6 release (June 5) added Gemma 4 models optimized via Quantization-Aware Training (QAT), reducing memory requirements ~72% while maintaining near-original quality. v0.30.4 (June 3) introduced Nemotron-3-Ultra support for reasoning/long-running agent workflows and fixed Metal GPU offload for multimodal models on Apple Silicon. v0.30.2 added Qwen Code support and improved token accounting for cached prompts.