vllm — AI Digest

10 мая vLLM v0.20.2: TurboQuant 2-bit KV Cache and FlashAttention 4 Default for MoE Serving tools
18 мая vLLM v0.21.0: Blackwell MLA Backend, HMA KV Offload, Spec Decode for Reasoning Models vLLM Project tools
2 июн vLLM v0.22.0: DeepSeek V4 Production Hardening, Rust Frontend, 28.9% Latency Drop tools
9 июн vLLM Semantic Router v0.3 Themis: Stateful Production Routing with Session-Aware Agentic Routing tools
14 июн vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention MiniMax tools
17 июн vLLM v0.23.0: Model Runner V2 Default for Llama and Mistral, Transformers v5, Multi-Tier KV Cache tools
29 апр vLLM v0.20.0 — third release in two weeks vLLM tools
2 июн BadHost (CVE-2026-48710): Host-Header Auth Bypass in Starlette Exposes vLLM, LiteLLM, and MCP Servers tools
6 мая vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11 tools
12 мая vLLM v0.21.0rc1: Python 3.14, CUDA 13.0, and Transformers v5 Compatibility tools
13 мая vLLM v0.21.0rc1: PyTorch 2.11, HuggingFace Transformers v5, and Python 3.14 Support tools