llama.cpp b9589–b9592: CUDA SSM Sync Fix and Mamba Memory Optimization
Four builds landed around June 10. b9589 fixes missing thread-sync barriers before shared memory reuse in CUDA SSM scan operations — a correctness bug affecting Mamba-family models running on GPU. b9591 consolidates D2D memory copies for MTP/Mamba into a single strided transfer and refactors ggml_gated_delta_net, reducing overhead. b9590 fixes LFM2/LFM2.5 ignoring json_schema from response_format. b9592 updates LibreSSL to 4.3.2.
Why it matters
The CUDA SSM sync fix addresses a silent correctness issue — affected users may have been getting subtly wrong outputs from Mamba models without knowing it. The memory transfer consolidation improves throughput for Mamba architectures gaining traction as attention alternatives.
Importance: 2/5
Correctness fix for Mamba/SSM GPU inference; silent bug that could affect output quality for local Mamba model users.