Ollama v0.23.1: Gemma 4 MTP Speculative Decoding Delivers 2× Speed on Apple Silicon

Tools official 1 src. ~1 min

Ollama v0.23.1, released May 5, 2026, introduces Gemma 4 MTP (Multi-Token Processing) speculative decoding for the MLX runner on Apple Silicon, delivering over 2× speed improvement for the Gemma 4 31B model on coding tasks. The release also includes MLX and MLX-C threading fixes and a Go 1.26 language bump.

Why it matters

More than doubling coding throughput for a state-of-the-art 31B model on commodity Mac hardware is a meaningful step for local coding agent workflows without cloud dependency.

Importance: 2/5

2× speed boost for Gemma 4 on Apple Silicon via MTP speculative decoding.

inference ollama local-ai apple-silicon speculative-decoding gemma release

Sources

official Ollama Releases — ollama/ollama