agents — AI Digest

30 апр GLM-5V-Turbo: a natively multimodal foundation model for agents Z.ai research
7 мая xAI Releases Grok 4.3 with 1M Context, 40-60% Price Cuts, and Agentic Benchmark Gains xAI models-llm
8 мая Anthropic Launches Claude Managed Agents: Dreams, Outcomes, Multiagent Orchestration Anthropic tools
12 мая Google Announces Gemini Intelligence for Android with Cross-App Automation Google tools
15 мая EVA-Bench: End-to-End Framework for Evaluating Voice Agents ServiceNow AI research
19 мая Google I/O 2026: Gemini 4, Jules V2, Firebase Studio GA, Android XR, and Aluminium OS Google DeepMind models-llm
20 мая Gemini 3.5 Flash Released at Google I/O 2026: Frontier Coding + Agentic at Flash Speed Google DeepMind models-llm
2 июн Alibaba Launches Qwen3.7-Plus: Multimodal Agent with Vision, Reasoning, and Autonomous Execution Alibaba / Qwen models-llm
8 июн NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads NVIDIA models-llm
18 июн GitHub Copilot App Is Now Generally Available GitHub tools
30 апр Yandex Commerce Protocol: first retailers launch sales via Alice AI Yandex industry
30 апр Mistral Workflows: public preview of a Temporal-based engine for enterprise AI orchestration Mistral tools
30 апр Recursive Multi-Agent Systems: agent communication in latent space Stanford University research
2 мая Eywa: heterogeneous collaboration framework between LLM agents and scientific foundation models University of Illinois at Urbana-Champaign research
6 мая Anthropic Launches Ten Financial Services AI Agent Templates with Microsoft 365 Integration Anthropic tools
6 мая Roo Code Announces Shutdown on May 15, Pivoting to Roomote Cloud Agent Roo Code tools
7 мая MiniMax Hailuo 2.3 Launches with Media Agent and 50% Cheaper Batch Video Generation MiniMax video
9 мая ByteDance Launches Doubao-Seed-2.0-lite: First Omni-Modal Model in Seed Series ByteDance models-llm
10 мая Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4 Google DeepMind research
11 мая Claude Code v2.1.139–v2.1.140: Agent View Research Preview and /goal Command Anthropic tools
13 мая Claude Platform on AWS Reaches General Availability Anthropic tools
13 мая RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research
14 мая Anthropic Launches Claude for Small Business Anthropic tools
14 мая Notion Launches Developer Platform 3.5 with External Agents API, Workers, and CLI Notion tools
19 мая Anthropic Acquires Stainless, the SDK and MCP Tooling Startup Used by OpenAI and Google Anthropic industry
20 мая Google Launches Gemini Spark: 24/7 Personal AI Agent in Google AI Ultra Google tools
20 мая Google Launches Antigravity 2.0: Agent-First Dev Platform with Desktop App, CLI, and Managed Agents API Google tools
20 мая Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes) Multi-institution (42 authors) research
20 мая SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes) Memtensor Research Group / IAAR-Shanghai research
4 июн Microsoft Launches Scout: Always-On Autopilot AI Agent for Microsoft 365 Microsoft tools
6 июн OpenAI Rolls Out Lockdown Mode to Block Prompt-Injection Exfiltration in ChatGPT OpenAI tools
12 июн OpenAI Acquires German Startup Ona to Power Persistent Codex Cloud Agents OpenAI industry
16 июн NVIDIA SkillSpector: Open-Source Security Scanner for AI Agent Skills NVIDIA tools
18 июн OpenAI Launches Scheduled Tasks in ChatGPT, Sunsets Pulse OpenAI tools
19 июн AWS Summit New York 2026: Bedrock AgentCore GA, Kiro iOS Preview, and AWS Context Previewed Amazon tools
19 июн OpenAI Publishes Deployment Simulation: Predicting Model Behavior Before Release OpenAI research
19 июн ENPIRE: AI Coding Agents Close the Loop on Physical Robotics Research Without Human Intervention NVIDIA / Carnegie Mellon University / UC Berkeley research
8 мая Automated Weak-to-Strong Researcher: AI Agents Outperform Humans on Alignment Research Anthropic research
10 мая Anthropic Eliminates Claude's Agentic Blackmail Behavior via 'Teaching Claude Why' Anthropic research
6 июн MLEvolve: Self-Evolving Multi-Agent LLM Framework for Automated ML Algorithm Discovery research
11 июн Kwai Keye-VL-2.0: Open-Source 30B MoE Multimodal Model with 256K Context for Long Video Kwai research
17 июн JoyAI-VL-Interaction: Open-Source 8B Real-Time VLM with Autonomous Turn-Taking JD.com research
6 мая Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research
7 мая GitHub Copilot VS Code April Releases: BYOK Model Keys, Browser Tab Sharing, Terminal Write GitHub tools
7 мая AWS MCP Server Reaches General Availability with Full API Access and IAM Audit Controls Amazon Web Services tools
7 мая GitHub MCP Server: Secret Scanning GA and Dependency Scanning Public Preview GitHub tools
8 мая Google DeepMind Publishes AlphaEvolve One-Year Impact Report Google DeepMind research
8 мая AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research
9 мая OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents Tencent Hunyuan research
9 мая ARIS: Autonomous ML Research via Adversarial Multi-Agent Collaboration Shanghai Jiao Tong University research
14 мая LangChain Launches LangSmith Engine (Public Beta) and SmithDB at Interrupt 2026 LangChain tools
16 мая SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents Zhejiang University / Meituan research
16 мая MemLens: Benchmark for Multimodal Long-Term Memory in Vision-Language Models NVIDIA research
19 мая MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes) Shanghai Jiao Tong University research
2 июн Crafter: Multi-Agent Harness for Editable Scientific Figure Generation Scores +16pt Over Baselines (103 HF Upvotes) Tsinghua University research
2 июн GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research
6 июн The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research
8 июн Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research
11 июн Claude Code v2.1.172–v2.1.173: Nested Sub-Agents Up to 5 Levels Deep Anthropic tools
11 июн Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement NLPIR Lab research
11 июн DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data AweAI Team research
14 июн Moonshot AI Opens Kimi Work Desktop Agent with 300-Sub-Agent Swarm and WebBridge Moonshot AI tools
14 июн EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments MIT / NUS / Salesforce research
14 июн WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate Microsoft Research research
14 июн InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation CUHK Multimedia Lab research
16 июн FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research
18 июн Cursor 3.7: Cloud Dev Environments and /in-cloud Subagents Cursor tools
19 июн Google DeepMind Publishes AI Control Roadmap: Defense-in-Depth Against Misaligned Coding Agents Google DeepMind research
28 апр Firefly AI Assistant — Public Beta Adobe image
1 мая AutoResearchBench — a benchmark for autonomous scientific literature search by AI agents BAAI research
5 мая OpenClaw 2026.5.3: File Transfer Plugin and Cross-Platform Messaging Reliability tools
11 мая AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40 research
6 июн Sber Launches GigaChat-Powered Multi-Agent Business Assistant for Corporate Banking at SPIEF 2026 Sber industry
12 июн InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research
12 июн EvoArena: LLM Agents Score Only 39.6% on Dynamic Evolving Environments Benchmark MIT research
12 июн FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents research
11 мая Alibaba Integrates Qwen AI with Taobao for End-to-End Agentic Shopping Alibaba industry
13 мая Alibaba Integrates Qwen AI with Taobao to Launch Agentic Conversational Shopping Alibaba industry
28 апр Claude Code v2.1.121 Anthropic tools
28 апр Codex CLI rust-v0.126.0-alpha.8 OpenAI tools
2 мая GitHub Copilot for Visual Studio April 2026 update ships agentic workflows GitHub tools
4 мая Intern-Atlas: 1M-Paper Methodology Evolution Graph as Research Infrastructure for AI Scientists research
6 мая OpenClaw 2026.5.4: Google Meet Voice Bridge with Gemini and Backpressure-Aware Audio tools
6 мая HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research
7 мая Cursor 3.3: Context Usage Breakdown for Agent Diagnostics Cursor tools
7 мая LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research
7 мая Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research
8 мая Claude Code v2.1.133: Effort Hooks, Worktree BaseRef Setting, and Admin Policy Keys Anthropic tools
8 мая OpenClaw v2026.5.5: 60+ Bug Fixes Across Messaging Platforms and AI Providers tools
9 мая Direct Corpus Interaction: Rethinking Retrieval for Agentic Search TIGER-Lab research
12 мая NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation Shanghai AI Lab research
13 мая OpenClaw v2026.5.12-beta: Subagent Session Nesting and 20-Turn Agent-to-Agent Ping-Pong tools
8 июн SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research
9 июн SWE-Explore: Benchmarking Repository Exploration as the Binding Constraint in Coding Agents Shanghai Jiao Tong University research
10 июн SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research research
11 июн OpenCode v1.17.1–v1.17.3: Auth Recovery, Sub-Agent Permissions, Linux Launcher SST tools
16 июн Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research
19 июн GitHub Copilot June 18 Changelog: MAI-Code-1-Flash Expands and AGENTS.md Lands in Code Review GitHub tools
19 июн Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops research
28 апр Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond HKUST/NUS/Oxford/NTU research
10 мая Yandex Launches Alice AI Agent to Search WW2 Veteran Records in Russian Archives Yandex tools
20 мая Sber Opens Testing of GigaCowork: No-Code AI Agent Management Platform for Enterprises Sber tools