llama.cpp June 16 Builds: Eagle3 Speculative Decoding, Vulkan UMA Memory, NVFP4 Fixes

Tools official 3 src. ~1 min

llama.cpp shipped incremental builds b9660–b9672 on June 16. Notable: Eagle3 speculative decoding backend sampling support (b9669), Vulkan preference for host-visible memory on UMA devices (b9668), NVFP4 edge-case fixes in llama-graph (b9670), SYCL support for Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (b9664), and BoringSSL vendor update to 0.20260616.0 (b9672).

Why it matters

Eagle3 speculative decoding in the backend sampler extends the fastest local inference technique to more hardware. Vulkan UMA optimization benefits iGPU and Apple unified-memory setups.

Importance: 2/5

Daily builds but notable Eagle3 speculative decoding and Vulkan UMA improvements for local inference

Sources

official llama.cpp b9672