Ollama v0.31.1: Gemma 4 Nearly 90% Faster on Apple Silicon via MTP

Ollama

Tools official 1 src. ~1 min

Ollama v0.31.1 (June 30) delivers approximately 90% faster Gemma 4 token generation on Apple Silicon via multi-token prediction (MTP) with automatic tuning enabled by default — no configuration required. The release also updates the MLX engine with a new small-batch matrix multiplication kernel and upgrades the llama.cpp backend to build 9840.

Why it matters

Near-doubling of throughput for Gemma 4 on Mac hardware significantly expands the viability of running this model locally for interactive coding-agent use cases where latency matters.

Importance: 2/5

90% Gemma 4 inference speedup on Apple Silicon via MTP; no-config improvement for local inference

ollama inference apple-silicon gemma local-llm mlx

Sources

official Ollama v0.31.1 Release Notes