Ollama v0.23.1: Gemma 4 MTP Speculative Decoding Delivers 2× Speed on Apple Silicon
Ollama v0.23.1, released May 5, 2026, introduces Gemma 4 MTP (Multi-Token Processing) speculative decoding for the MLX runner on Apple Silicon, delivering over 2× speed improvement for the Gemma 4 31B model on coding tasks. The release also includes MLX and MLX-C threading fixes and a Go 1.26 language bump.
Why it matters
More than doubling coding throughput for a state-of-the-art 31B model on commodity Mac hardware is a meaningful step for local coding agent workflows without cloud dependency.
Importance: 2/5
2× speed boost for Gemma 4 on Apple Silicon via MTP speculative decoding.
Sources
official
Ollama Releases — ollama/ollama