Daily digest

17 items · ~17 min · Week 2026-W27

Worth knowing (11)

Google Releases Nano Banana 2 Lite: 4-Second Images at $0.034 per 1,000

Google DeepMind
Image official + media 3 src. ~1 min

Google released Nano Banana 2 Lite (Gemini 3.1 Flash-Lite Image) to general availability on June 30, generating images in approximately 4 seconds at $0.034 per 1,000 images. Available in Google AI Studio, the Gemini API, and rolling out to AI Mode in Google Search, Gemini app, NotebookLM, and Google Photos. All outputs include SynthID watermarking.

Why it matters
Nano Banana 2 Lite sets a new low-cost bar in production image generation APIs. At under four cents per thousand images with 4-second generation, high-volume image pipelines become economically practical at frontier quality.

Meta Announces "Meta Compute" Cloud Business to Monetize Surplus AI Infrastructure

Meta AI
Industry media only 3 src. ~1 min

Meta revealed plans on July 1 to launch a cloud business — working name "Meta Compute" — selling access to its AI infrastructure to outside customers. Two tiers are planned: raw compute rental and hosted model access. Led by infrastructure head Santosh Janardhan, Meta Superintelligence Labs chief Daniel Gross, and president Dina Powell McCormick. Meta's stock rose approximately 9% on the news. No pricing or launch date has been announced.

Why it matters
Meta is on track to spend up to $145 billion on AI infrastructure in 2026. Selling excess capacity would transform that investment from a pure cost center into revenue and put Meta in direct competition with AWS, Azure, and Google Cloud in the enterprise AI market for the first time.

DeepSeek Confirms V4 Official Launch for Mid-July with Peak-Time API Pricing

DeepSeek
Models / LLM official + media 3 src. ~1 min

DeepSeek confirmed around July 1 that the official release of DeepSeek V4 is scheduled for mid-July, following its April 24 preview. V4 ships as V4-Pro (1.6T total / 49B active params) and V4-Flash (284B total / 13B active), both with a 1M-token context window. For the first time, DeepSeek introduces peak/off-peak API pricing: usage doubles during 9 AM–12 PM and 2–6 PM Beijing time. Legacy model names deepseek-chat and deepseek-reasoner retire on July 24.

Why it matters
The 1M-context upgrade across both tiers raises the bar for open-weight models. The peak pricing experiment signals DeepSeek is managing real capacity constraints — even at peak prices, V4 remains more than 17x cheaper than GPT-5.5.

OpenAI Releases GeneBench-Pro, a Frontier Benchmark for AI Agents in Biology

OpenAI
Research official + media 2 src. ~1 min

OpenAI released GeneBench-Pro (June 30), a 129-problem benchmark testing AI judgment across genomics, cancer biology, clinical diagnostics, and pharmacogenomics. Problems require sequential judgment calls that a human expert would take 20–40 hours to resolve. GPT-5.6 Sol scores 28.7% (31.5% in Pro mode); Claude Opus 4.8 scores 16.0%. Ten representative questions are open-sourced on Hugging Face.

Why it matters
Unlike knowledge-recall benchmarks, GeneBench-Pro measures 'research taste' under uncertainty. GPT-5.6 Sol failing more than 70% of expert-level tasks shows the gap between current frontier models and autonomous scientific reasoning.

Anthropic Proposes Industry-Wide Cyber Jailbreak Severity Scale

Anthropic
Research official 1 src. ~1 min

Published July 2, Anthropic detailed Fable 5's four-tier cybersecurity classifier and proposed the Cyber Jailbreak Severity (CJS) scale — CJS-0 through CJS-4 — scoring jailbreaks on capability gain, attack breadth, ease of weaponization, and discoverability. Developed with Project Glasswing partners including Amazon, Microsoft, and Google, and offered for industry-wide adoption.

Why it matters
A shared severity vocabulary for AI jailbreaks mirrors how CVSS scoring standardized traditional vulnerability disclosure. If CJS is adopted across labs, it enables faster coordinated response to safety incidents and gives policymakers a concrete metric.

Program-as-Weights: Compile-Once Adapter Paradigm Matches 32B Models at 1/50 the Memory

Research official 1 src. ~1 min

Researchers from the University of Waterloo introduce Program-as-Weights (PAW), where a 4B-parameter compiler generates small reusable adapter weights for tasks that resist rule-based solutions. A 0.6B Qwen3 interpreter guided by these adapters matches a 32B model while using 1/50th the inference memory and running at 30 tokens/second on a MacBook M3. The authors also release FuzzyBench, a 10-million-example training dataset.

Why it matters
PAW reframes foundation model usage from per-input inference to a compile-once, run-many pattern. The 50x memory reduction enables frontier-quality task performance on consumer hardware.

PerceptionRubrics: Atomic Rubric Evaluation Reveals 8% Perception Gap Between Open and Closed Models

Research official 1 src. ~1 min

Johns Hopkins University researchers present PerceptionRubrics (ICML 2026), pairing 1,000+ visually dense images with 12,004 atomic evaluation rubrics split into Must-Right and Easy-Wrong criteria. A gated binary scoring mechanism penalizes failures on mandatory visual elements rather than averaging scores. Key finding: an 8% perception gap persists between open-source frontier models and proprietary leaders.

Why it matters
Standard multimodal benchmarks inflate scores by averaging over components; PerceptionRubrics exposes brittleness in visually rich domains and correlates better with human judgment.

Zhipu AI Launches ZCode Coding Agent on GLM-5.2, Targeting Cursor and Claude Code

Zhipu AI
Tools official + media 3 src. ~1 min

Zhipu AI released ZCode on July 2 — a desktop coding agent harness powered by GLM-5.2, an MIT-licensed open-weight model with ~750B MoE parameters and 1M-token context. Ships with 20+ integrated developer tools including Git and terminal access, multi-agent collaboration, and remote control via WeChat, Feishu, and Telegram. Zhipu is offering 5 million free tokens to new users through July 31.

Why it matters
ZCode directly enters the coding-assistant market dominated by Cursor, GitHub Copilot, and Claude Code. GLM-5.2 was already ranking first among open-source models on Artificial Analysis at roughly one-sixth the cost of GPT-5.5, making the competitive threat concrete.

Kimi K2.7-Code Becomes the First Open-Weight Model in GitHub Copilot

Moonshot AI
Tools official 1 src. ~1 min

Kimi K2.7-Code, an open-weight model from Moonshot AI with +21.8% on Kimi Code Bench v2 over its predecessor, became generally available in GitHub Copilot on July 1 — the first open-weight model selectable from the Copilot model picker. Available across VS Code (v1.127.0+), GitHub.com, JetBrains, Xcode, Eclipse, and GitHub Mobile, hosted by GitHub on Microsoft Azure.

Why it matters
Adding an open-weight option to the Copilot picker breaks the all-proprietary-model pattern and gives developers a lower-cost, auditable alternative inside their existing workflow without switching tools.

Cascade Reaches End-of-Life; Devin Local Launches with Open ACP Protocol

Cognition
Tools media only 2 src. ~1 min

Cascade — the agentic core of Windsurf (rebranded to Devin Desktop on June 2) — reached end-of-life on July 1. Devin Local replaces it: a Rust rewrite claiming up to 30% better token efficiency with parallel subagents. Devin Desktop now ships native support for the open Agent Client Protocol (ACP, Apache 2.0), allowing Codex, Claude Agent, Gemini CLI, and OpenCode to run as first-class sessions. CI pipelines invoking Cascade by name require manual re-pointing.

Why it matters
Cascade's EOL completes Cognition's pivot from Windsurf's architecture. ACP support positions Devin Desktop as an agent host rather than a single-agent IDE — a structural bet that developers want to manage multiple coding agents from one interface.

Runway Launches Agent Skills for Autonomous Ad Campaign and Commercial Production

Runway
Video official 1 src. ~1 min

On July 2, Runway shipped Agent Skills across all plan tiers. The feature executes complete ad campaigns, commercials, and localized ad variants through natural-language commands, with the agent handling multi-step creative production — scripting, generation, and adaptation — automatically. Builds on Runway's Gen-4.5 video backbone, accessible at runwayml.com/agent.

Why it matters
Agent Skills marks Runway's shift from a generation tool into a full creative production agent targeting advertising and marketing workflows — the first commercially available multi-step video production agent from a major video AI lab.
For reference (6)

Yandex Consolidates AI Teams Under Alice AI with New Leadership Appointments

Yandex
Industry media only 3 src. ~1 min

Yandex restructured its AI leadership on July 2, unifying teams under Alice AI as a single cross-functional platform. Dmitry Timko was appointed to head Alice AI; Alexander Popovskiy takes over global Search. The reorganization aims to shorten release cycles and accelerate rollout of specialized AI assistants including planned 'neuro-lawyer' and 'neuro-accountant' features.

Why it matters
Yandex is consolidating engineering leadership and resources behind Alice AI as its primary commercial AI platform — a shift from fragmented service-by-service development toward a unified agentic interface.

ELDR: Expert-Locality-Aware Routing Cuts MoE Serving Latency by up to 14%

Microsoft Research
Research official 1 src. ~1 min

Microsoft Research introduces ELDR, a routing system for prefill-decode disaggregated serving of MoE models. During prefill, it builds an expert signature per request; during decode, offline K-means clustering and online locality-band routing minimize distinct expert weight loads across workers. Tested up to 40 GPUs and three MoE models, ELDR achieves 5.9–13.9% median time-per-output-token improvement over load-balancing baselines.

Why it matters
MoE models are increasingly dominant in production but serving them efficiently at disaggregated scale remains unsolved. ELDR's gains are pure routing policy — no model changes required — making it drop-in deployable for any existing MoE serving stack.

FlashMorph: Data-Driven Hybrid Attention Layer Placement via Learnable Gates

ByteDance Seed
Research official 1 src. ~1 min

ByteDance Seed and Fudan University researchers propose FlashMorph, which determines optimal layer placement for hybrid attention architectures (full vs linear attention) using learnable gates optimized on synthetic long-context retrieval data. Gates are discretized into a fixed hybrid layout after training. FlashMorph finds more effective configurations than heuristic methods while preserving long-context recall and benchmark performance.

Why it matters
Hybrid attention models are a key efficiency direction for long-context inference. FlashMorph provides a principled, data-driven method to discover optimal configurations — relevant to any team building or adapting hybrid attention architectures.

Claude Code v2.1.199: Stacked Slash-Skill Invocations and Streaming Reliability

Anthropic
Tools official 1 src. ~1 min

Claude Code v2.1.199 (July 2) added stacked slash-skill invocations — up to 5 leading skills per command — fixed SSL certificate errors to surface actionable guidance immediately, and improved streaming reliability so partial output is retained when the API emits mid-stream errors.

Why it matters
Stacked slash-skills enable more complex single-command workflows; streaming reliability fixes are important for users on high-load or flaky connections.

GitHub Copilot CLI v1.0.68: Kimi K2.7-Code Support in Headless Environments

GitHub
Tools official 1 src. ~1 min

GitHub Copilot CLI v1.0.68 (July 1) added support for the kimi-k2.7-code model alongside improvements to transient IDE disconnect handling and Thai/Devanagari terminal text rendering. v1.0.69-0 (pre-release, July 2) added file and folder completion to /sandbox path entries.

Why it matters
The CLI is the primary interface for Copilot in headless and CI environments. Adding Kimi K2.7-code on the same day it landed in the IDE gives terminal-first and CI workflows immediate access to the open-weight model.

Sberbank Integrates GigaChat AI Analyst into SberBiznes for Marketplace Sellers

Sber
Tools media only 2 src. ~1 min

On July 1, Sberbank integrated a GigaChat-based AI analyst into its SberBiznes online banking platform for marketplace sellers. Answers questions on 110 topics including sales, advertising effectiveness, and profitability using ABC/XYZ analysis. Responses are delivered in approximately 30 seconds versus 5–15 minutes with traditional BI tools.

Why it matters
One of the first GigaChat integrations embedded directly into a major Russian banking product, demonstrating Sber's strategy of distributing GigaChat through its existing business-banking ecosystem to reach e-commerce SMBs without a separate AI product.