llama.cpp June 16 Builds: Eagle3 Speculative Decoding, Vulkan UMA Memory, NVFP4 Fixes

Tools official 3 src. ~1 min

llama.cpp shipped incremental builds b9660–b9672 on June 16. Notable: Eagle3 speculative decoding backend sampling support (b9669), Vulkan preference for host-visible memory on UMA devices (b9668), NVFP4 edge-case fixes in llama-graph (b9670), SYCL support for Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (b9664), and BoringSSL vendor update to 0.20260616.0 (b9672).

Why it matters

Eagle3 speculative decoding in the backend sampler extends the fastest local inference technique to more hardware. Vulkan UMA optimization benefits iGPU and Apple unified-memory setups.

Importance: 2/5

Daily builds but notable Eagle3 speculative decoding and Vulkan UMA improvements for local inference

llama-cpp inference local-llm open-source speculative-decoding

Sources

official llama.cpp b9672

official llama.cpp b9669: Eagle3 backend sampling

official llama.cpp b9668: Vulkan UMA