llama.cpp b9754: Real-Time Model Load Progress via SSE and PEG Grammar Parser
llama.cpp shipped ~12 tagged builds on June 21, 2026 (b9743–b9754). Key additions: b9747 adds real-time model load progress tracking via /models/sse (Server-Sent Events); b9750 implements the Jinja call statement for template generation; b9754 adds an automaton-based PEG parser for stricter grammar-constrained generation. All builds ship cross-platform binaries for macOS, Linux, Windows, and Android.
Why it matters
Real-time SSE progress streaming reduces opaque startup latency for local inference frontends; PEG grammar parser improves structured output reliability
Importance: 2/5
Active release cadence (12 builds in one day); two new functional features for local inference deployments