llama.cpp b9603: Qualcomm Adreno OpenCL Kernels for On-Device Inference
ggml-org
llama.cpp release b9603 (June 12) added OpenCL q5_0 and q5_1 GEMM/GEMV kernels for Qualcomm Adreno GPUs, co-authored with Qualcomm engineers. This enables hardware-accelerated quantized inference on Qualcomm-powered Android devices and Snapdragon laptops. Other recent builds in the window: b9601 Vulkan build fix; b9596 server router-mode logging optimization; b9591 MTP memory optimization; b9590 LFM2 json_schema fix.
Why it matters
Adreno is the most common mobile GPU architecture. These OpenCL kernels bring optimized quantized inference to a large hardware base that previously had limited llama.cpp acceleration support.
Importance: 2/5
OpenCL Adreno kernels expand mobile inference to the most common mobile GPU architecture
Sources
official
llama.cpp b9603 release — GitHub