-
GLM-5V-Turbo: a natively multimodal foundation model for agents
Z.ai
research
-
xAI Releases Grok 4.3 with 1M Context, 40-60% Price Cuts, and Agentic Benchmark Gains
xAI
models-llm
-
Anthropic Launches Claude Managed Agents: Dreams, Outcomes, Multiagent Orchestration
Anthropic
tools
-
Google Announces Gemini Intelligence for Android with Cross-App Automation
Google
tools
-
EVA-Bench: End-to-End Framework for Evaluating Voice Agents
ServiceNow AI
research
-
Google I/O 2026: Gemini 4, Jules V2, Firebase Studio GA, Android XR, and Aluminium OS
Google DeepMind
models-llm
-
Gemini 3.5 Flash Released at Google I/O 2026: Frontier Coding + Agentic at Flash Speed
Google DeepMind
models-llm
-
Alibaba Launches Qwen3.7-Plus: Multimodal Agent with Vision, Reasoning, and Autonomous Execution
Alibaba / Qwen
models-llm
-
NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads
NVIDIA
models-llm
-
GitHub Copilot App Is Now Generally Available
GitHub
tools
-
Yandex Commerce Protocol: first retailers launch sales via Alice AI
Yandex
industry
-
Mistral Workflows: public preview of a Temporal-based engine for enterprise AI orchestration
Mistral
tools
-
Recursive Multi-Agent Systems: agent communication in latent space
Stanford University
research
-
Eywa: heterogeneous collaboration framework between LLM agents and scientific foundation models
University of Illinois at Urbana-Champaign
research
-
Anthropic Launches Ten Financial Services AI Agent Templates with Microsoft 365 Integration
Anthropic
tools
-
Roo Code Announces Shutdown on May 15, Pivoting to Roomote Cloud Agent
Roo Code
tools
-
MiniMax Hailuo 2.3 Launches with Media Agent and 50% Cheaper Batch Video Generation
MiniMax
video
-
ByteDance Launches Doubao-Seed-2.0-lite: First Omni-Modal Model in Seed Series
ByteDance
models-llm
-
Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4
Google DeepMind
research
-
Claude Code v2.1.139–v2.1.140: Agent View Research Preview and /goal Command
Anthropic
tools
-
Claude Platform on AWS Reaches General Availability
Anthropic
tools
-
RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards
Google
research
-
Anthropic Launches Claude for Small Business
Anthropic
tools
-
Notion Launches Developer Platform 3.5 with External Agents API, Workers, and CLI
Notion
tools
-
Anthropic Acquires Stainless, the SDK and MCP Tooling Startup Used by OpenAI and Google
Anthropic
industry
-
Google Launches Gemini Spark: 24/7 Personal AI Agent in Google AI Ultra
Google
tools
-
Google Launches Antigravity 2.0: Agent-First Dev Platform with Desktop App, CLI, and Managed Agents API
Google
tools
-
Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes)
Multi-institution (42 authors)
research
-
SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes)
Memtensor Research Group / IAAR-Shanghai
research
-
Microsoft Launches Scout: Always-On Autopilot AI Agent for Microsoft 365
Microsoft
tools
-
OpenAI Rolls Out Lockdown Mode to Block Prompt-Injection Exfiltration in ChatGPT
OpenAI
tools
-
OpenAI Acquires German Startup Ona to Power Persistent Codex Cloud Agents
OpenAI
industry
-
NVIDIA SkillSpector: Open-Source Security Scanner for AI Agent Skills
NVIDIA
tools
-
OpenAI Launches Scheduled Tasks in ChatGPT, Sunsets Pulse
OpenAI
tools
-
AWS Summit New York 2026: Bedrock AgentCore GA, Kiro iOS Preview, and AWS Context Previewed
Amazon
tools
-
OpenAI Publishes Deployment Simulation: Predicting Model Behavior Before Release
OpenAI
research
-
ENPIRE: AI Coding Agents Close the Loop on Physical Robotics Research Without Human Intervention
NVIDIA / Carnegie Mellon University / UC Berkeley
research
-
Automated Weak-to-Strong Researcher: AI Agents Outperform Humans on Alignment Research
Anthropic
research
-
Anthropic Eliminates Claude's Agentic Blackmail Behavior via 'Teaching Claude Why'
Anthropic
research
-
MLEvolve: Self-Evolving Multi-Agent LLM Framework for Automated ML Algorithm Discovery
research
-
Kwai Keye-VL-2.0: Open-Source 30B MoE Multimodal Model with 256K Context for Long Video
Kwai
research
-
JoyAI-VL-Interaction: Open-Source 8B Real-Time VLM with Autonomous Turn-Taking
JD.com
research
-
Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs
research
-
GitHub Copilot VS Code April Releases: BYOK Model Keys, Browser Tab Sharing, Terminal Write
GitHub
tools
-
AWS MCP Server Reaches General Availability with Full API Access and IAM Audit Controls
Amazon Web Services
tools
-
GitHub MCP Server: Secret Scanning GA and Dependency Scanning Public Preview
GitHub
tools
-
Google DeepMind Publishes AlphaEvolve One-Year Impact Report
Google DeepMind
research
-
AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4
Google DeepMind
research
-
OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents
Tencent Hunyuan
research
-
ARIS: Autonomous ML Research via Adversarial Multi-Agent Collaboration
Shanghai Jiao Tong University
research
-
LangChain Launches LangSmith Engine (Public Beta) and SmithDB at Interrupt 2026
LangChain
tools
-
SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents
Zhejiang University / Meituan
research
-
MemLens: Benchmark for Multimodal Long-Term Memory in Vision-Language Models
NVIDIA
research
-
MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes)
Shanghai Jiao Tong University
research
-
Crafter: Multi-Agent Harness for Editable Scientific Figure Generation Scores +16pt Over Baselines (103 HF Upvotes)
Tsinghua University
research
-
GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes)
University of Massachusetts Amherst
research
-
The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary
research
-
Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning
Carnegie Mellon University / Ohio State University
research
-
Claude Code v2.1.172–v2.1.173: Nested Sub-Agents Up to 5 Levels Deep
Anthropic
tools
-
Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement
NLPIR Lab
research
-
DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data
AweAI Team
research
-
Moonshot AI Opens Kimi Work Desktop Agent with 300-Sub-Agent Swarm and WebBridge
Moonshot AI
tools
-
EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments
MIT / NUS / Salesforce
research
-
WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate
Microsoft Research
research
-
InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation
CUHK Multimedia Lab
research
-
FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60%
Microsoft / Shanghai Jiao Tong University
research
-
Cursor 3.7: Cloud Dev Environments and /in-cloud Subagents
Cursor
tools
-
Google DeepMind Publishes AI Control Roadmap: Defense-in-Depth Against Misaligned Coding Agents
Google DeepMind
research
-
Firefly AI Assistant — Public Beta
Adobe
image
-
AutoResearchBench — a benchmark for autonomous scientific literature search by AI agents
BAAI
research
-
OpenClaw 2026.5.3: File Transfer Plugin and Cross-Platform Messaging Reliability
tools
-
AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40
research
-
Sber Launches GigaChat-Powered Multi-Agent Business Assistant for Corporate Banking at SPIEF 2026
Sber
industry
-
InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation
research
-
EvoArena: LLM Agents Score Only 39.6% on Dynamic Evolving Environments Benchmark
MIT
research
-
FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents
research
-
Alibaba Integrates Qwen AI with Taobao for End-to-End Agentic Shopping
Alibaba
industry
-
Alibaba Integrates Qwen AI with Taobao to Launch Agentic Conversational Shopping
Alibaba
industry
-
Claude Code v2.1.121
Anthropic
tools
-
Codex CLI rust-v0.126.0-alpha.8
OpenAI
tools
-
GitHub Copilot for Visual Studio April 2026 update ships agentic workflows
GitHub
tools
-
Intern-Atlas: 1M-Paper Methodology Evolution Graph as Research Infrastructure for AI Scientists
research
-
OpenClaw 2026.5.4: Google Meet Voice Bridge with Gemini and Backpressure-Aware Audio
tools
-
HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL
research
-
Cursor 3.3: Context Usage Breakdown for Agent Diagnostics
Cursor
tools
-
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents
Shanghai Jiao Tong University
research
-
Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic
research
-
Claude Code v2.1.133: Effort Hooks, Worktree BaseRef Setting, and Admin Policy Keys
Anthropic
tools
-
OpenClaw v2026.5.5: 60+ Bug Fixes Across Messaging Platforms and AI Providers
tools
-
Direct Corpus Interaction: Rethinking Retrieval for Agentic Search
TIGER-Lab
research
-
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation
Shanghai AI Lab
research
-
OpenClaw v2026.5.12-beta: Subagent Session Nesting and 20-Turn Agent-to-Agent Ping-Pong
tools
-
SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory
research
-
SWE-Explore: Benchmarking Repository Exploration as the Binding Constraint in Coding Agents
Shanghai Jiao Tong University
research
-
SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research
research
-
OpenCode v1.17.1–v1.17.3: Auth Recovery, Sub-Agent Permissions, Linux Launcher
SST
tools
-
Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23%
National University of Singapore
research
-
GitHub Copilot June 18 Changelog: MAI-Code-1-Flash Expands and AGENTS.md Lands in Code Review
GitHub
tools
-
Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops
research
-
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
HKUST/NUS/Oxford/NTU
research
-
Yandex Launches Alice AI Agent to Search WW2 Veteran Records in Russian Archives
Yandex
tools
-
Sber Opens Testing of GigaCowork: No-Code AI Agent Management Platform for Enterprises
Sber
tools