vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11
vLLM v0.20.1, released May 4, 2026, is a patch release stabilizing DeepSeek V4 on the new CUDA 13 + PyTorch 2.11 baseline established in v0.20.0. Fixes include a persistent topk cooperative deadlock, NVFP4 MoE kernel support for RTX Blackwell workstation GPUs, and multi-stream pre-attention GEMM performance improvements. The v0.20.x series also added HuggingFace Transformers v5 support.
Why it matters
vLLM's move to CUDA 13/PyTorch 2.11/Transformers v5 is a forcing function for the broader ecosystem; the DeepSeek V4 deadlock fix unblocks production deployments of the leading open MoE model.
Importance: 2/5
Patch release stabilizing DeepSeek V4 on CUDA 13 + PyTorch 2.11 — important for production deployments.
Sources
official
vLLM Releases — vllm-project/vllm