llama.cpp Builds b9830–b9837: DFlash v2, MiniCPM5 Parser, --reasoning-preserve Flag

ggml-org

Tools official 1 src. ~1 min

Six llama.cpp builds shipped June 28–29 (b9830–b9837). Key additions: b9830 adds an `--offline` flag to `llama download` for cache-only model access and fixes a use-after-free in URL-task callbacks; b9831 adds DFlash v2 with per-layer sliding window attention; b9833 implements a dedicated MiniCPM5 PEG parser with XML tool-call support; b9837 adds `--reasoning-preserve` to retain chain-of-thought tokens in Jinja and chat output.

Why it matters

DFlash v2 broadens local inference model compatibility; `--reasoning-preserve` gives developers explicit control over whether thinking traces surface in output, increasingly relevant as more local models expose chain-of-thought tokens.

Importance: 2/5

6 builds in 2 days with DFlash v2 and reasoning-preserve; continuous delivery from the primary local inference library

llama-cpp inference open-source speculative-decoding

Sources

official llama.cpp releases — ggml-org/llama.cpp on GitHub