vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11

Tools official 1 src. ~1 min

vLLM v0.20.1, released May 4, 2026, is a patch release stabilizing DeepSeek V4 on the new CUDA 13 + PyTorch 2.11 baseline established in v0.20.0. Fixes include a persistent topk cooperative deadlock, NVFP4 MoE kernel support for RTX Blackwell workstation GPUs, and multi-stream pre-attention GEMM performance improvements. The v0.20.x series also added HuggingFace Transformers v5 support.

Why it matters

vLLM's move to CUDA 13/PyTorch 2.11/Transformers v5 is a forcing function for the broader ecosystem; the DeepSeek V4 deadlock fix unblocks production deployments of the leading open MoE model.

Importance: 2/5

Patch release stabilizing DeepSeek V4 on CUDA 13 + PyTorch 2.11 — important for production deployments.

inference vllm open-source gpu deepseek release

Sources

official vLLM Releases — vllm-project/vllm