gpu — AI Digest

6 мая SGLang v0.5.11: Speculative Decoding V2 as Default and Eight New Model Architectures tools
18 мая vLLM v0.21.0: Blackwell MLA Backend, HMA KV Offload, Spec Decode for Reasoning Models vLLM Project tools
2 июн vLLM v0.22.0: DeepSeek V4 Production Hardening, Rust Frontend, 28.9% Latency Drop tools
6 мая vLLM v0.20.1: DeepSeek V4 Stabilization on CUDA 13 and PyTorch 2.11 tools