llama.cpp b9161/b9169: Codex CLI Compatibility and Qwen3A Multimodal Support

ggml-org

Tools official 2 src. ~1 min

llama.cpp b9161 (May 15) adds Codex CLI compatibility by detecting and skipping unsupported Responses API tools with a warning instead of hard-failing, enabling local models as backends for the OpenAI Codex CLI workflow. b9169 adds MTMD (multimodal) chunk support and fixes preprocessing for Qwen3A, including audio token handling corrections and chunk size limits to prevent OOM. b9174 (May 16) restructures the WebUI into tools/ui with updated CMake variables.

Why it matters

Codex CLI compatibility in llama.cpp lets developers swap in locally-hosted models within OpenAI's agentic coding workflow, enabling fully offline or self-hosted alternatives. Qwen3A multimodal support extends local inference options for the rapidly-adopted Qwen3 family.

Importance: 2/5

Codex CLI compatibility bridges local inference with OpenAI's coding agent ecosystem

inference open-source release codex qwen multimodal local-llm

Sources

official llama.cpp release b9161 — GitHub

official llama.cpp release b9169 — GitHub