Daily digest
17 items · ~17 min · Week 2026-W27
Worth knowing (11)
Google Releases Nano Banana 2 Lite: 4-Second Images at $0.034 per 1,000
Google DeepMindGoogle released Nano Banana 2 Lite (Gemini 3.1 Flash-Lite Image) to general availability on June 30, generating images in approximately 4 seconds at $0.034 per 1,000 images. Available in Google AI Studio, the Gemini API, and rolling out to AI Mode in Google Search, Gemini app, NotebookLM, and Google Photos. All outputs include SynthID watermarking.
Meta Announces "Meta Compute" Cloud Business to Monetize Surplus AI Infrastructure
Meta AIMeta revealed plans on July 1 to launch a cloud business — working name "Meta Compute" — selling access to its AI infrastructure to outside customers. Two tiers are planned: raw compute rental and hosted model access. Led by infrastructure head Santosh Janardhan, Meta Superintelligence Labs chief Daniel Gross, and president Dina Powell McCormick. Meta's stock rose approximately 9% on the news. No pricing or launch date has been announced.
DeepSeek Confirms V4 Official Launch for Mid-July with Peak-Time API Pricing
DeepSeekDeepSeek confirmed around July 1 that the official release of DeepSeek V4 is scheduled for mid-July, following its April 24 preview. V4 ships as V4-Pro (1.6T total / 49B active params) and V4-Flash (284B total / 13B active), both with a 1M-token context window. For the first time, DeepSeek introduces peak/off-peak API pricing: usage doubles during 9 AM–12 PM and 2–6 PM Beijing time. Legacy model names deepseek-chat and deepseek-reasoner retire on July 24.
OpenAI Releases GeneBench-Pro, a Frontier Benchmark for AI Agents in Biology
OpenAIOpenAI released GeneBench-Pro (June 30), a 129-problem benchmark testing AI judgment across genomics, cancer biology, clinical diagnostics, and pharmacogenomics. Problems require sequential judgment calls that a human expert would take 20–40 hours to resolve. GPT-5.6 Sol scores 28.7% (31.5% in Pro mode); Claude Opus 4.8 scores 16.0%. Ten representative questions are open-sourced on Hugging Face.
Anthropic Proposes Industry-Wide Cyber Jailbreak Severity Scale
AnthropicPublished July 2, Anthropic detailed Fable 5's four-tier cybersecurity classifier and proposed the Cyber Jailbreak Severity (CJS) scale — CJS-0 through CJS-4 — scoring jailbreaks on capability gain, attack breadth, ease of weaponization, and discoverability. Developed with Project Glasswing partners including Amazon, Microsoft, and Google, and offered for industry-wide adoption.
Program-as-Weights: Compile-Once Adapter Paradigm Matches 32B Models at 1/50 the Memory
Researchers from the University of Waterloo introduce Program-as-Weights (PAW), where a 4B-parameter compiler generates small reusable adapter weights for tasks that resist rule-based solutions. A 0.6B Qwen3 interpreter guided by these adapters matches a 32B model while using 1/50th the inference memory and running at 30 tokens/second on a MacBook M3. The authors also release FuzzyBench, a 10-million-example training dataset.
PerceptionRubrics: Atomic Rubric Evaluation Reveals 8% Perception Gap Between Open and Closed Models
Johns Hopkins University researchers present PerceptionRubrics (ICML 2026), pairing 1,000+ visually dense images with 12,004 atomic evaluation rubrics split into Must-Right and Easy-Wrong criteria. A gated binary scoring mechanism penalizes failures on mandatory visual elements rather than averaging scores. Key finding: an 8% perception gap persists between open-source frontier models and proprietary leaders.
Zhipu AI Launches ZCode Coding Agent on GLM-5.2, Targeting Cursor and Claude Code
Zhipu AIZhipu AI released ZCode on July 2 — a desktop coding agent harness powered by GLM-5.2, an MIT-licensed open-weight model with ~750B MoE parameters and 1M-token context. Ships with 20+ integrated developer tools including Git and terminal access, multi-agent collaboration, and remote control via WeChat, Feishu, and Telegram. Zhipu is offering 5 million free tokens to new users through July 31.
Kimi K2.7-Code Becomes the First Open-Weight Model in GitHub Copilot
Moonshot AIKimi K2.7-Code, an open-weight model from Moonshot AI with +21.8% on Kimi Code Bench v2 over its predecessor, became generally available in GitHub Copilot on July 1 — the first open-weight model selectable from the Copilot model picker. Available across VS Code (v1.127.0+), GitHub.com, JetBrains, Xcode, Eclipse, and GitHub Mobile, hosted by GitHub on Microsoft Azure.
Cascade Reaches End-of-Life; Devin Local Launches with Open ACP Protocol
CognitionCascade — the agentic core of Windsurf (rebranded to Devin Desktop on June 2) — reached end-of-life on July 1. Devin Local replaces it: a Rust rewrite claiming up to 30% better token efficiency with parallel subagents. Devin Desktop now ships native support for the open Agent Client Protocol (ACP, Apache 2.0), allowing Codex, Claude Agent, Gemini CLI, and OpenCode to run as first-class sessions. CI pipelines invoking Cascade by name require manual re-pointing.
Runway Launches Agent Skills for Autonomous Ad Campaign and Commercial Production
RunwayOn July 2, Runway shipped Agent Skills across all plan tiers. The feature executes complete ad campaigns, commercials, and localized ad variants through natural-language commands, with the agent handling multi-step creative production — scripting, generation, and adaptation — automatically. Builds on Runway's Gen-4.5 video backbone, accessible at runwayml.com/agent.
For reference (6)
Yandex Consolidates AI Teams Under Alice AI with New Leadership Appointments
YandexYandex restructured its AI leadership on July 2, unifying teams under Alice AI as a single cross-functional platform. Dmitry Timko was appointed to head Alice AI; Alexander Popovskiy takes over global Search. The reorganization aims to shorten release cycles and accelerate rollout of specialized AI assistants including planned 'neuro-lawyer' and 'neuro-accountant' features.
ELDR: Expert-Locality-Aware Routing Cuts MoE Serving Latency by up to 14%
Microsoft ResearchMicrosoft Research introduces ELDR, a routing system for prefill-decode disaggregated serving of MoE models. During prefill, it builds an expert signature per request; during decode, offline K-means clustering and online locality-band routing minimize distinct expert weight loads across workers. Tested up to 40 GPUs and three MoE models, ELDR achieves 5.9–13.9% median time-per-output-token improvement over load-balancing baselines.
FlashMorph: Data-Driven Hybrid Attention Layer Placement via Learnable Gates
ByteDance SeedByteDance Seed and Fudan University researchers propose FlashMorph, which determines optimal layer placement for hybrid attention architectures (full vs linear attention) using learnable gates optimized on synthetic long-context retrieval data. Gates are discretized into a fixed hybrid layout after training. FlashMorph finds more effective configurations than heuristic methods while preserving long-context recall and benchmark performance.
Claude Code v2.1.199: Stacked Slash-Skill Invocations and Streaming Reliability
AnthropicClaude Code v2.1.199 (July 2) added stacked slash-skill invocations — up to 5 leading skills per command — fixed SSL certificate errors to surface actionable guidance immediately, and improved streaming reliability so partial output is retained when the API emits mid-stream errors.
GitHub Copilot CLI v1.0.68: Kimi K2.7-Code Support in Headless Environments
GitHubGitHub Copilot CLI v1.0.68 (July 1) added support for the kimi-k2.7-code model alongside improvements to transient IDE disconnect handling and Thai/Devanagari terminal text rendering. v1.0.69-0 (pre-release, July 2) added file and folder completion to /sandbox path entries.
Sberbank Integrates GigaChat AI Analyst into SberBiznes for Marketplace Sellers
SberOn July 1, Sberbank integrated a GigaChat-based AI analyst into its SberBiznes online banking platform for marketplace sellers. Answers questions on 110 topics including sales, advertising effectiveness, and profitability using ABC/XYZ analysis. Responses are delivered in approximately 30 seconds versus 5–15 minutes with traditional BI tools.