llama.cpp Builds b9830–b9837: DFlash v2, MiniCPM5 Parser, --reasoning-preserve Flag
ggml-org
Six llama.cpp builds shipped June 28–29 (b9830–b9837). Key additions: b9830 adds an `--offline` flag to `llama download` for cache-only model access and fixes a use-after-free in URL-task callbacks; b9831 adds DFlash v2 with per-layer sliding window attention; b9833 implements a dedicated MiniCPM5 PEG parser with XML tool-call support; b9837 adds `--reasoning-preserve` to retain chain-of-thought tokens in Jinja and chat output.
Why it matters
DFlash v2 broadens local inference model compatibility; `--reasoning-preserve` gives developers explicit control over whether thinking traces surface in output, increasingly relevant as more local models expose chain-of-thought tokens.
Importance: 2/5
6 builds in 2 days with DFlash v2 and reasoning-preserve; continuous delivery from the primary local inference library