NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads

NVIDIA

Models / LLM official + media 2 src. ~1 min

NVIDIA Nemotron 3 Ultra became available on June 4, announced at Computex. The model has 550B total and ~55B active parameters in a Mixture-of-Experts Hybrid Mamba-Attention architecture targeting long-running agentic tasks with persistent memory and multi-step tool use. It scores 48 on the Artificial Analysis Intelligence Index, the highest among US open-weights models. Distributed via Hugging Face, ModelScope, OpenRouter, and as NVIDIA NIM microservices; inference reaches 300+ tokens/second on DeepInfra.

Why it matters

Currently the most capable US-origin open-weights model, giving teams a strong self-hostable option for complex agent pipelines without closed APIs. The Hybrid Mamba architecture reduces memory bandwidth at long context, enabling cost-effective multi-agent orchestration.

Importance: 4/5

SOTA US open-weights model (48 on AA Intelligence Index); official NVIDIA announcement + media coverage; 550B scale with practical inference speed.

open-weights moe agents inference long-context us

Sources

official NVIDIA Debuts Nemotron 3 Family of Open Models — NVIDIA Newsroom

media NVIDIA Releases Nemotron 3 Ultra — MarkTechPost