Daily digest
16 items · ~16 min · Week 2026-W20
Must-read (2)
OpenAI Launches Deployment Company with $4B+ Investment and Tomoro Acquisition
OpenAIOpenAI announced the OpenAI Deployment Company on May 11, 2026 — a majority OpenAI-owned venture backed by 19 investment firms with over $4 billion in initial capital, led by TPG with Advent, Bain Capital, and Brookfield as co-leads. Simultaneously, OpenAI agreed to acquire Edinburgh-based AI consulting firm Tomoro, bringing approximately 150 Forward Deployed Engineers to embed in enterprise clients and help organizations ship frontier AI into production workflows.
Pixal3D: Pixel-Aligned Image-to-3D Generation Accepted at SIGGRAPH 2026
Tencent ARC LabPixal3D introduces a pixel-aligned image-to-3D generation paradigm accepted at SIGGRAPH 2026. Instead of loosely injecting image features via attention, it explicitly lifts multi-scale pixel features into a 3D feature volume via back-projection, establishing direct pixel-to-3D correspondences and enabling near-reconstruction-level fidelity with detailed geometry and PBR textures. Code, demo, and HuggingFace model were released simultaneously.
Worth knowing (7)
Alibaba Integrates Qwen AI with Taobao for End-to-End Agentic Shopping
AlibabaAlibaba announced on May 11, 2026 that it is merging its Qwen AI platform with the Taobao e-commerce marketplace, replacing keyword-based product search with a conversational AI agent that can browse, compare, and complete purchases end-to-end across a catalog of over 4 billion products. The integration includes virtual try-ons, 30-day price tracking, and Alipay-native checkout managed autonomously by the agent.
Mean Mode Screaming: Training Pathology Fix Enables 1000-Layer Diffusion Transformers
This paper identifies Mean Mode Screaming (MMS) — a training collapse where Diffusion Transformers at extreme depths suppress token variation while loss appears stable. The proposed Mean-Variance Split (MV-Split) Residuals combine a separately gained centered residual update with a leaky trunk-mean replacement, eliminating collapse events and enabling stable training of 1000-layer DiTs.
Flow-OPD: On-Policy Distillation Pushes GenEval +29 Points on Stable Diffusion 3.5
Flow-OPD is the first framework to integrate on-policy distillation into flow matching text-to-image models. A two-stage strategy — single-reward GRPO fine-tuning of specialized teacher models, then consolidation via dense trajectory-level vector field supervision with Manifold Anchor Regularization — achieves GenEval +29 points (63→92) and OCR accuracy +35 points (59→94) on Stable Diffusion 3.5 Medium, surpassing individual teacher models.
Anthropic's Claude Platform Reaches General Availability on AWS
AnthropicAnthropic made its native Claude Platform generally available through Amazon Web Services on May 11, 2026 — the first cloud provider to offer the full native Claude Platform experience via AWS billing and IAM authentication. The offering includes Claude Managed Agents (beta), web search and fetch, code execution, Files API, Skills, MCP connector, prompt caching, citations, and batch processing across 19 global regions.
Claude Code v2.1.139–v2.1.140: Agent View Research Preview and /goal Command
AnthropicClaude Code v2.1.139 (May 11) ships Agent View as a research preview — `claude agents` opens a single dashboard listing all running, blocked, and completed sessions, allowing developers to supervise parallel autonomous coding tasks from one terminal pane. The companion /goal command lets users declare a completion condition and keeps Claude iterating autonomously across turns with a live elapsed-time/turns/token overlay. v2.1.140 (May 12) followed with bug fixes: resolves a /goal hang on hook restrictions, `claude --bg` crash on enterprise endpoints, Windows event-loop stall, and Read tool offset validation.
AWS MCP Server and Agent Toolkit Reach General Availability
AWSAWS highlighted the general availability of the AWS MCP Server — a managed remote MCP endpoint providing secure, IAM-governed access to all AWS services through a fixed tool set — and the Agent Toolkit for AWS, a production-ready suite of skills, guidance, and sandboxed script execution included at no extra charge. Both were announced May 6 and featured in the AWS Weekly Roundup on May 11.
Fake OpenAI Repo Hits #1 Trending on Hugging Face with 244K Downloads, Delivers Infostealer
A repository named 'Open-OSS/privacy-filter' copied OpenAI's legitimate Privacy Filter model card nearly verbatim and reached #1 on Hugging Face trending within 18 hours, accumulating around 244,000 downloads before removal. The loader.py file delivered a six-stage Rust-based infostealer harvesting browser credentials, Discord tokens, crypto wallet keys, and SSH credentials, with suspected ties to the Silver Fox threat group. Six related repositories impersonating Qwen3, DeepSeek, and other popular models were also found.
For reference (7)
OpenAI Retires DALL-E 2 and DALL-E 3 APIs on May 12
OpenAIOn May 12, 2026, OpenAI shut down the DALL-E 2 and DALL-E 3 API endpoints after notifying developers in November 2025. All calls to /v1/images/generations using either model string now return errors; developers must migrate to gpt-image-1 or gpt-image-1-mini, which use a different response format (base64 PNG instead of URLs) and token-based pricing rather than per-image charges.
Soohak: 64 Mathematicians Build Research-Level Benchmark That Stumps Frontier LLMs
Seoul National UniversitySoohak is a 439-problem benchmark authored from scratch by 64 professional mathematicians to evaluate whether frontier LLMs can reason at the level required to advance mathematical knowledge. Top models score only 10.4–30.4% on challenge problems (Claude Opus 4.5 at 10.4%, Gemini 3 Pro at 30.4%, GPT-5 at 26.4%). A novel refusal subset tests whether models can detect ill-posed problems and abstain — no model exceeds 50% on this dimension.
AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40
AutoTTS proposes an environment-driven framework where LLM agents automatically discover test-time scaling strategies rather than researchers hand-crafting them. Formulating width-depth TTS as controller synthesis over pre-collected reasoning trajectories, the method discovers a Confidence Momentum Controller (CMC) that improves accuracy-cost tradeoff over manual baselines, generalizing across benchmarks and model scales — and costs only $39.90 and 160 minutes to run.
GitHub Copilot CLI v1.0.45: /autopilot Toggle and /fork Session Branching
GitHubGitHub Copilot CLI v1.0.45 (May 11) adds /autopilot to toggle between interactive and fully autonomous modes, a /fork command to branch the current session into an independent copy, OpenTelemetry alignment with GenAI semantic conventions (MCP tool calls get standard tool_call spans), Windows PowerShell 5 fallback, and approximately 1.5 second startup improvement.
Cursor Launches Microsoft Teams Integration for Cloud Agent Delegation
CursorCursor launched a Microsoft Teams integration on May 11, allowing users to mention @Cursor in any Teams channel to delegate coding tasks to a cloud agent. Cursor reads the full thread for context, picks the appropriate repository and model automatically, then opens a pull request for the team to review — without leaving the chat interface.
OpenCode v1.14.45–v1.14.48: Built-in Customize Skill and Image Attachment Fixes
SSTSST shipped four OpenCode releases on May 10–11. v1.14.46 introduced a built-in `customize-opencode` skill for safer config edits; v1.14.47 restored prompt-editing keybindings and fixed model persistence across sessions; v1.14.48 preserves original image attachments instead of downsampling before sending to the model.
ShengShu Technology Launches Vidu Claw: AI-Powered End-to-End Ad Production Platform
ShengShu TechnologyShengShu Technology launched Vidu Claw on May 12, 2026, an AI marketing platform powered by the Vidu Q3 video model that takes a single marketing brief and outputs a complete advertising campaign — including planning, scripting, storyboarding, and platform-ready video. Flash Mode delivers 1080p clips in 80–150 seconds; Video Plan subscription charges per completed ad output rather than per credit.