Ollama v0.30.7: Hermes Desktop Support, Gemma 4 QAT, and Nemotron-3-Ultra

Ollama

Tools official 1 src. ~1 min

Ollama v0.30.7 (June 7, 2026) adds native Windows support for Hermes Desktop and aligns OpenAI-compatible API model lists with available tags. The v0.30.6 release (June 5) added Gemma 4 models optimized via Quantization-Aware Training (QAT), reducing memory requirements ~72% while maintaining near-original quality. v0.30.4 (June 3) introduced Nemotron-3-Ultra support for reasoning/long-running agent workflows and fixed Metal GPU offload for multimodal models on Apple Silicon. v0.30.2 added Qwen Code support and improved token accounting for cached prompts.

Why it matters

Gemma 4 QAT support dramatically lowers the hardware bar for running Google's multimodal model locally, and Nemotron-3-Ultra support brings NVIDIA's flagship reasoning model to local inference. Six versions in five days reflects active integration across multiple new model families.

Importance: 2/5

Patch release cluster adding major model support (Gemma 4 QAT, Nemotron-3-Ultra) to local inference.

ollama inference local-llm open-source

Sources

official Ollama Releases — GitHub