Daily digest
14 items · ~14 min · Week 2026-W20
Must-read (1)
OpenAI Launches ChatGPT Personal Finance with Plaid Integration
OpenAIOpenAI launched a personal finance preview for ChatGPT Pro subscribers in the US on May 15, 2026, letting users connect over 12,000 financial institutions via Plaid. The feature provides a dashboard covering portfolio performance, spending, subscriptions, and upcoming payments, and supports natural-language queries about budgeting, debt repayment, and financial planning. The launch follows OpenAI's acquisition of personal finance startup Hiro; Intuit integration is planned to enable tax impact analysis.
Worth knowing (6)
Orthrus: 7.8x Inference Speedup for Qwen3 via Autoregressive-Diffusion KV Sharing
Orthrus (arXiv 2605.12825) combines a frozen pretrained autoregressive LLM with a lightweight trainable diffusion module sharing the same KV cache, enabling parallel token generation with an exact intra-model consensus mechanism that produces lossless output. Applied to Qwen3 (1.7B, 4B, 8B), it achieves up to 7.8x tokens-per-forward-pass speedup with O(1) additional memory overhead. The GitHub implementation trended on Hacker News (34 points) and GitHub Python trending May 15–16.
Causal Forcing++: 2-Step Distillation Enables Real-Time Interactive Video Generation
Tsinghua UniversityCausal Forcing++ (arXiv 2605.15141, 80 HF Daily upvotes) proposes causal consistency distillation to train 2-step frame-wise autoregressive video generation models, surpassing the SOTA 4-step Causal Forcing baseline on both quality and latency. Applied to action-conditioned world model generation, it substantially cuts training cost while maintaining fidelity. Enables real-time interactive video synthesis.
SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents
Zhejiang University / MeituanSDAR (arXiv 2605.15155, 69 HF Daily upvotes) combines On-Policy Self-Distillation (OPSD) as a gated auxiliary objective alongside GRPO RL for multi-turn LLM agents. A sigmoid gate selectively amplifies teacher-endorsed tokens while attenuating distillation noise from imperfect rejections. Evaluated on Qwen2.5 and Qwen3 across ALFWorld, WebShop, and Search-QA, achieving +9.4%, +10.2%, and +7.0% improvements over baseline GRPO respectively.
SANA-WM: Minute-Scale 720p World Modeling on a Single GPU
NVIDIASANA-WM (arXiv 2605.15178, 54 HF Daily upvotes) is a 2.6B-parameter world model generating high-fidelity 720p video at minute scale with 6-DOF camera control. It uses hybrid linear attention to handle long sequences and a dual-branch camera control system. Generates 60-second clips on a single GPU; distilled versions run on consumer hardware. Trained in 15 days on 64 GPUs, significantly more efficient than comparable industrial systems.
MemLens: Benchmark for Multimodal Long-Term Memory in Vision-Language Models
NVIDIAMemLens (arXiv 2605.14906, 62 HF Daily upvotes) evaluates long-term multimodal memory in vision-language models through 789 questions across five memory capabilities and four context lengths, testing 27 models and 7 memory-augmented agents. Key finding: long-context LVLMs succeed via direct visual grounding in short contexts but degrade sharply as conversations grow, while memory agents remain stable but lose visual fidelity. Multi-session reasoning challenges virtually all tested systems.
Claude Code v2.1.143: Plugin Dependency Enforcement, Cost Estimates, and Background Stability
AnthropicClaude Code v2.1.143 shipped May 15 with plugin dependency enforcement (disable refuses when another plugin depends on the target; enable force-enables transitive dependencies), projected context cost estimates in the plugin marketplace, and a new worktree.bgIsolation:none setting for repos where worktrees are impractical. On Windows, PowerShell now passes -ExecutionPolicy Bypass by default across Bedrock, Vertex, and Foundry providers. Over 30 bug fixes address corrupt .credentials.json hangs at startup, macOS Full Disk Access errors for background agents, repeated PowerShell process spawning in claude agents, and multiple background session reliability regressions.
For reference (7)
Sber Closes First GigaChat Enterprise Hardware Leasing Deal
SberOn May 15, 2026, Sber (via SberLeasing and Salute for Business) completed Russia's first leasing transaction for the GigaChat Enterprise software-hardware complex. The client is a major Russian real estate developer that will use GigaChat to build an AI sales-manager assistant. The deal offers minimal upfront payment with 36-month leasing terms, making enterprise GenAI accessible without large capital expenditure.
Yandex Deploys Alice AI NFC Pendants at Night at the Museum Event
YandexOn May 14, 2026, Yandex announced NFC-enabled Alice AI pendants distributed to visitors at Moscow's Night at the Museum event (May 16). Tapping the pendant to a smartphone opens an Alice AI chat for exhibit information and navigation. The deployment covers the Museum of Moscow, Pushkin State Museum, and Nesterenko Gallery, with AI photo zones that stylize visitor photos in the manner of museum exhibits.
OpenCode v1.15.1: Collapsible Thinking View and Pinned Sessions
SSTOpenCode v1.15.1 (May 16) adds collapsible thinking view with inline expansion, pinned sessions with quick-switch slots in the session picker, and fixes duplicate prompt history entries, file watching for repos where .git is a symlink, and multiline @-mention handling. The release follows v1.15.0 (Effect-based event system) and v1.14.51 (experimental background subagents) shipped May 15.
GitHub Copilot: Grok Code Fast 1 Deprecated, User Memory Preferences for Pro
GitHubTwo Copilot changes shipped May 15: Grok Code Fast 1 was deprecated across all Copilot experiences (chat, inline edits, completions) — admins should switch to GPT-5 mini or Claude Haiku 4.5. Separately, Copilot Memory now supports user-level preferences for Pro and Pro+ subscribers, allowing stated or inferred preferences (commit message style, PR structure, communication tone) to follow users across all repositories and agents; manageable in personal Copilot Memory settings.
OpenAI Codex Alpha: Permissions Architecture Overhaul and Remote Control API
OpenAIOpenAI Codex shipped three alpha pre-releases on May 15 (v0.131.0-alpha.19/21/22). Active commits reveal a large-scale permissions migration replacing SandboxPolicy with PermissionProfile throughout the codebase, plus runtimeWorkspaceRoots additions to app-server thread APIs. Additional work includes remote control API updates, memory prompt injection moved to app-server extension, compact hook parity for remote compaction v2, and TUI restructuring into focused modules. Still pre-alpha; no stable release announced.
Pydantic AI v1.97.0: New MCPToolset and GoogleProvider Split
PydanticPydantic AI v1.97.0 (May 15) introduces MCPToolset using fastmcp-slim[client] and deprecates the older MCPServer* and FastMCPToolset implementations. GoogleProvider is split into two classes: GoogleProvider (id: google:) for Gemini API and GoogleCloudProvider (id: google-cloud:) for Vertex AI. OnlineEvaluator gains run_on_errors capability. Agent.to_a2a() and bundled fasta2a integration are deprecated in favor of the external fasta2a package.
llama.cpp b9161/b9169: Codex CLI Compatibility and Qwen3A Multimodal Support
ggml-orgllama.cpp b9161 (May 15) adds Codex CLI compatibility by detecting and skipping unsupported Responses API tools with a warning instead of hard-failing, enabling local models as backends for the OpenAI Codex CLI workflow. b9169 adds MTMD (multimodal) chunk support and fixes preprocessing for Qwen3A, including audio token handling corrections and chunk size limits to prevent OOM. b9174 (May 16) restructures the WebUI into tools/ui with updated CMake variables.