llama.cpp June 16 Builds: Eagle3 Speculative Decoding, Vulkan UMA Memory, NVFP4 Fixes
llama.cpp shipped incremental builds b9660–b9672 on June 16. Notable: Eagle3 speculative decoding backend sampling support (b9669), Vulkan preference for host-visible memory on UMA devices (b9668), NVFP4 edge-case fixes in llama-graph (b9670), SYCL support for Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (b9664), and BoringSSL vendor update to 0.20260616.0 (b9672).
Why it matters
Eagle3 speculative decoding in the backend sampler extends the fastest local inference technique to more hardware. Vulkan UMA optimization benefits iGPU and Apple unified-memory setups.
Importance: 2/5
Daily builds but notable Eagle3 speculative decoding and Vulkan UMA improvements for local inference
Sources
official
llama.cpp b9672
official
llama.cpp b9668: Vulkan UMA