-
vLLM v0.20.2: TurboQuant 2-bit KV Cache and FlashAttention 4 Default for MoE Serving
tools
-
vLLM v0.21.0: Blackwell MLA Backend, HMA KV Offload, Spec Decode for Reasoning Models
vLLM Project
tools
-
vLLM v0.22.0: DeepSeek V4 Production Hardening, Rust Frontend, 28.9% Latency Drop
tools
-
vLLM Semantic Router v0.3 Themis: Stateful Production Routing with Session-Aware Agentic Routing
tools
-
vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention
MiniMax
tools
-
vLLM v0.23.0: Model Runner V2 Default for Llama and Mistral, Transformers v5, Multi-Tier KV Cache
tools
-
vLLM v0.20.0 — third release in two weeks
vLLM
tools
-
BadHost (CVE-2026-48710): Host-Header Auth Bypass in Starlette Exposes vLLM, LiteLLM, and MCP Servers
tools
-
vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11
tools
-
vLLM v0.21.0rc1: Python 3.14, CUDA 13.0, and Transformers v5 Compatibility
tools
-
vLLM v0.21.0rc1: PyTorch 2.11, HuggingFace Transformers v5, and Python 3.14 Support
tools