Ollama v0.31.1: Gemma 4 Nearly 90% Faster on Apple Silicon via MTP
Ollama
Ollama v0.31.1 (June 30) delivers approximately 90% faster Gemma 4 token generation on Apple Silicon via multi-token prediction (MTP) with automatic tuning enabled by default — no configuration required. The release also updates the MLX engine with a new small-batch matrix multiplication kernel and upgrades the llama.cpp backend to build 9840.
Why it matters
Near-doubling of throughput for Gemma 4 on Mac hardware significantly expands the viability of running this model locally for interactive coding-agent use cases where latency matters.
Importance: 2/5
90% Gemma 4 inference speedup on Apple Silicon via MTP; no-config improvement for local inference
Sources
official
Ollama v0.31.1 Release Notes