Daily digest
14 items · ~14 min · Week 2026-W24
Must-read (4)
US Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5 Globally
AnthropicOn June 12, 2026, the US Commerce Department issued an export-control directive requiring Anthropic to block all access to Claude Fable 5 and Mythos 5 for foreign nationals — including Anthropic's own foreign-national employees. Because selective enforcement in real time was impossible, Anthropic disabled both models globally within hours of the order. The company complied while publicly disputing the necessity: it argued the jailbreak the government cited was narrow, non-universal, and comparable to weaknesses in other commercially available models, and warned that applying this threshold industry-wide 'would essentially halt all new model deployments.' All other Anthropic models remained available. Claude Code v2.1.177 (June 13) silently redirects any Fable 5 model selection to Claude Opus 4.8.
Moonshot AI Releases Kimi K2.7-Code: 1T-Parameter Open-Weight Coding Model with Vision
Moonshot AIMoonshot AI released Kimi K2.7-Code on June 12, 2026 — weights posted to HuggingFace (moonshotai/Kimi-K2.7-Code) under Modified MIT. The model is a 1-trillion-parameter MoE with 32B active parameters per token (384 experts, 8 selected), a 256K-token context window, and a 400M-parameter MoonViT vision encoder for image and video input. Vendor benchmarks show +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite over K2.6, with approximately 30% fewer reasoning tokens. API pricing: $0.95/$4.00 per million input/output tokens. Cloudflare Workers AI added the model on release day.
MiniMax Sparse Attention: 28× Compute Reduction at 1M-Token Context with No Quality Loss
MiniMaxMiniMax published a paper introducing a blockwise sparse attention mechanism built on Grouped Query Attention that achieves a 28.4× reduction in per-token attention compute at 1M-token context while matching the quality of full attention. The technique uses an Index Branch to score and select relevant KV blocks, with a Main Branch performing exact attention over the selected blocks. It underpins MiniMax M3, the first open-weight model combining frontier coding capability, 1M-token context, and native multimodality in a single architecture. The paper received 251 upvotes on HuggingFace Daily Papers.
MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math
MiniMaxMiniMax published MaxProof, a framework for training and test-time scaling of mathematical proof using the MiniMax M3 model series. It trains three capabilities — proof generation, verification, and critique-conditioned repair — using a generative verifier engineered for low false-positive rate. At inference, the model acts simultaneously as generator, verifier, refiner, and ranker, selecting a final proof via tournament ranking. MaxProof achieves 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the gold-medal threshold on both. Published on arXiv (2606.13473) with 75 upvotes on HuggingFace Daily Papers.
Worth knowing (8)
Zhipu AI Releases GLM-5.2: 744B MoE with 1M-Token Context and Coding-First Design
Zhipu AIZhipu AI (Z.ai) released GLM-5.2 on June 13, 2026, deploying it to all tiers of the GLM Coding Plan (Lite, Pro, Max). Built on a 744B-parameter MoE architecture with 40B active parameters, the model offers a 1-million-token context window (model ID: glm-5.2[1m]) and maximum 131K-token output. It introduces a dual thinking-effort system (High and Max modes) designed for long-horizon agentic software engineering tasks. General API access, integration into the Z.ai chatbot, and open-source weights under MIT are scheduled for the following week. No third-party benchmarks were published at launch.
EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments
MIT / NUS / SalesforceEvoArena is a benchmark that models environments as sequences of progressive updates across terminal, software, and social domains — exposing a gap in current agent evaluation that assumes static environments. Top agents currently achieve only ~40% accuracy. The paper also proposes EvoMem, a patch-based memory paradigm that records environment changes as structured update histories; EvoMem improves chain-level accuracy by 3.7% on EvoArena and 4–6% on GAIA and LoCoMo benchmarks. Published on arXiv (2606.13681) and received 121 upvotes on HuggingFace Daily Papers.
WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate
Microsoft ResearchWeaveBench introduces 114 real-world tasks requiring AI agents to combine GUI observations/actions with CLI and code operations in a single trajectory — the first benchmark explicitly targeting this hybrid-interface setting. The best current frontier model achieves only 41.2% pass rate on these long-horizon tasks. Published on arXiv (2606.09426) with 95 upvotes on HuggingFace Daily Papers.
InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation
CUHK Multimedia LabInterleaveThinker is a multi-agent pipeline — a planner and a critic agent — that equips any image generator with the ability to produce interleaved text-image sequences. The planner organizes input sequences; the critic evaluates outputs and refines instructions for regeneration. Training uses SFT datasets (80K planner, 112K critic examples) and GRPO reinforcement learning with step-wise rewards. The system achieves performance comparable to GPT-5-level models on interleaved generation benchmarks (WISE, RISE). Published on arXiv (2606.13679) with 124 upvotes on HuggingFace Daily Papers.
Claude Code v2.1.177: Fable 5 Forced Fallback to Opus 4.8, Bedrock Cache Fix, Security Patch
AnthropicClaude Code v2.1.177 shipped on June 13, 2026. Due to the US government directive, all Fable 5 model selections are automatically redirected to Claude Opus 4.8 without user action. Other changes: session titles are now generated in the conversation language (configurable via the 'language' setting); a new 'footerLinksRegexes' setting enables regex-matched link badges in the footer; Bedrock credential caching now respects actual token expiration rather than a fixed 1-hour window; a security fix closes a loophole where blocked models could be bypassed via the 'availableModels' allowlist. Additional bug fixes cover copy/paste over tmux SSH, Remote Control model switching, and Linux sandbox with symlinked settings files.
Moonshot AI Opens Kimi Work Desktop Agent with 300-Sub-Agent Swarm and WebBridge
Moonshot AIMoonshot AI opened Kimi Work for internal testing on June 12, 2026 — a downloadable macOS/Windows desktop application for local AI agent execution. It scales to 300 parallel sub-agents, includes a WebBridge browser extension that reuses existing logged-in browser sessions for automation, supports cron scheduling, local file access, Python script execution, and integration with A-share, Hong Kong, and US equity finance data. Reportedly runs on Kimi K2.6. Outputs include PowerPoint and Excel. The product page is live at kimi.com/products/kimi-work.
vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention
MiniMaxOn June 12, 2026, the vLLM team published a blog post announcing day-0 serving support for MiniMax M3 — a 456B-parameter open-weight model with a 1M-token context window, native multimodal input, and MiniMax Sparse Attention (MSA) architecture (open weights released approximately June 10–11). Deployment requires the '--block-size 128' flag due to MSA's sparse/index cache requirements. AMD announced simultaneous day-0 support on Instinct GPUs. On Fireworks AI, M3 is available with pricing described as roughly 1/20th the cost of comparable closed models.
ElevenLabs Launches Avatars in ElevenCreative: TTS-Native AI Talking-Head Video
ElevenLabsElevenLabs launched Avatars in ElevenCreative, a workflow that pairs the company's AI speech synthesis with lip-synced talking-head video generation. Users upload a photo or write a prompt to create a persistent avatar identity, then generate video across different angles, outfits, and backgrounds while retaining identity consistency. Voice and lip-synced video are produced in a single step. A new Avatar node in Flows enables batch generation across scripts, languages, and voices. Available on all paid plans.
For reference (2)
Anthropic Publishes First Public Record: 52,000-Person Survey on US AI Attitudes
AnthropicAnthropic released results from its first Anthropic Public Record on June 12, 2026 — a survey of nearly 52,000 Americans measuring hopes, fears, and governance preferences around AI, collected November–December 2025. The data found broad bipartisan consensus on major AI concerns. Anthropic intends to repeat the survey regularly and expand it internationally, framing it as a mechanism to ensure AI development reflects public input beyond existing Claude users.
OpenCode v1.17.5–v1.17.6: MCP Client Capabilities Declaration and Snowflake OAuth
SSTSST shipped two OpenCode releases on June 13, 2026. v1.17.6 formally declares OpenCode's supported MCP client capabilities — establishing a stable compatibility target for MCP server authors. v1.17.5 adds external browser OAuth for Snowflake Cortex (enabling auth without embedding credentials), improves project copy management and session move flows in the v2 API, recovers expired MCP sessions instead of leaving tools disconnected, returns structured MCP tool output in human-readable form, and fixes duplicate renderable IDs that could break TUI rendering. The desktop layer gains updated oc-2 color themes and improved terminal resize handling.