NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads
NVIDIA
NVIDIA Nemotron 3 Ultra became available on June 4, announced at Computex. The model has 550B total and ~55B active parameters in a Mixture-of-Experts Hybrid Mamba-Attention architecture targeting long-running agentic tasks with persistent memory and multi-step tool use. It scores 48 on the Artificial Analysis Intelligence Index, the highest among US open-weights models. Distributed via Hugging Face, ModelScope, OpenRouter, and as NVIDIA NIM microservices; inference reaches 300+ tokens/second on DeepInfra.
Why it matters
Currently the most capable US-origin open-weights model, giving teams a strong self-hostable option for complex agent pipelines without closed APIs. The Hybrid Mamba architecture reduces memory bandwidth at long context, enabling cost-effective multi-agent orchestration.
Importance: 4/5
SOTA US open-weights model (48 on AA Intelligence Index); official NVIDIA announcement + media coverage; 550B scale with practical inference speed.