Daily digest
8 items · ~8 min · Week 2026-W24
Must-read (1)
NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads
NVIDIANVIDIA Nemotron 3 Ultra became available on June 4, announced at Computex. The model has 550B total and ~55B active parameters in a Mixture-of-Experts Hybrid Mamba-Attention architecture targeting long-running agentic tasks with persistent memory and multi-step tool use. It scores 48 on the Artificial Analysis Intelligence Index, the highest among US open-weights models. Distributed via Hugging Face, ModelScope, OpenRouter, and as NVIDIA NIM microservices; inference reaches 300+ tokens/second on DeepInfra.
Worth knowing (4)
Google DeepMind Releases Gemma 4 QAT Checkpoints: Sub-1 GB On-Device E2B Model
Google DeepMindGoogle DeepMind released Quantization-Aware Training (QAT) checkpoints for the full Gemma 4 family on June 5. A new mobile QAT format cuts the E2B (2B) model to under 1 GB RAM (from 9.6 GB in BF16), while Q4_0 QAT reduces E2B from 9.6 GB to 3.2 GB and E4B from 15 GB to 5 GB. Weights ship on Hugging Face with immediate support in llama.cpp (b9549+ adds Gemma 4 MTP support), Ollama, LM Studio, vLLM, MLX, and LiteRT-LM.
Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning
Carnegie Mellon University / Ohio State UniversityThe paper provides the first theoretical proof that transformer-based agents learn depth-first search mechanisms purely from sparse RL feedback, without expert demonstrations. A two-head transformer is constructed where one head tracks prior actions and another detects failures and triggers backtracking. Under a depth-wise curriculum, DFS emerges in stages: models trained on shallow trees generalize to deeper ones, and imbalanced goal distributions cause return discounting to produce a prioritized DFS variant.
GitHub Copilot Gets 1M Token Context Window and Configurable Reasoning Levels
GitHub / MicrosoftGitHub announced on June 4 that Copilot now supports a one-million-token context window, enabling work across larger codebases and multi-file projects without losing context. Configurable reasoning levels let developers tune speed-vs-depth and enable extended thinking for architectural and debugging tasks. Both features are available in VS Code, Copilot CLI, and the Copilot app; larger context or higher reasoning consumes more GitHub AI Credits.
GitHub Copilot SDK Reaches General Availability with MCP and Six-Language Support
GitHub / MicrosoftThe GitHub Copilot SDK went GA on June 2, available in Node.js/TypeScript, Python, Go, .NET, Rust, and Java. It exposes Copilot's full agentic runtime — planning, tool invocation, file edits, streaming, and multi-turn sessions — through a stable API. Developers can register custom tools, connect MCP servers, override built-in tools, and support multi-client workflows where different clients contribute tools and permissions to the same session. Available to all Copilot subscribers and non-subscribers via BYOK.
For reference (3)
SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory
SubtleMemory introduces a 1,522-instance benchmark designed to test whether AI agents can handle memories that reinforce, diverge, or contradict each other — rather than simple recall. Built over 10 long histories grounded in 1,090 relation-controlled memory-variant sets, it evaluates 11 memory systems. All tested systems show systematic failure at fine-grained relational memory discrimination, with distinct failure modes across preservation, retrieval, and downstream reasoning stages.
Code2LoRA: Hypernetwork Generates Repo-Specific Adapters for Code LMs with Zero Inference Overhead
University of WaterlooCode2LoRA generates repository-specific LoRA adapters for code language models with zero inference-time token overhead. Two variants: Code2LoRA-Static converts a repo snapshot into an adapter; Code2LoRA-Evo maintains adapters via GRU state updated per code diff. Introduces RepoPeftBench (604 Python repos, static and evolution tracks). Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching per-repository LoRA fine-tuning without any per-repo training.
VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding
Yale UniversityVideoKR introduces a 315K-example training corpus for knowledge- and reasoning-intensive video understanding, built from 145K CC-licensed expert-domain videos with chain-of-thought rationales at progressively deeper reasoning depths. Includes VideoKR-Eval, an expert-annotated benchmark requiring genuine video-grounded reasoning rather than textual shortcuts. SFT followed by GRPO post-training on VideoKR outperforms prior post-training approaches.