llama.cpp b9161/b9169: Codex CLI Compatibility and Qwen3A Multimodal Support
ggml-org
llama.cpp b9161 (May 15) adds Codex CLI compatibility by detecting and skipping unsupported Responses API tools with a warning instead of hard-failing, enabling local models as backends for the OpenAI Codex CLI workflow. b9169 adds MTMD (multimodal) chunk support and fixes preprocessing for Qwen3A, including audio token handling corrections and chunk size limits to prevent OOM. b9174 (May 16) restructures the WebUI into tools/ui with updated CMake variables.
Why it matters
Codex CLI compatibility in llama.cpp lets developers swap in locally-hosted models within OpenAI's agentic coding workflow, enabling fully offline or self-hosted alternatives. Qwen3A multimodal support extends local inference options for the rapidly-adopted Qwen3 family.
Importance: 2/5
Codex CLI compatibility bridges local inference with OpenAI's coding agent ecosystem