SGLang v0.5.11: Speculative Decoding V2 as Default and Eight New Model Architectures
SGLang v0.5.11 switches to CUDA 13 + PyTorch 2.11 as its default baseline and enables Speculative Decoding V2 with overlap scheduling by default, reducing per-step CPU cost. The release adds support for eight new model architectures including Gemma 4, GLM-5.1, Qwen3.6, and Kimi-K2.6, and extends LoRA support to frontier-scale MLA-based MoE models such as DeepSeek-V3.
Why it matters
Speculative decoding V2 as the default changes the throughput baseline for all SGLang deployments; LoRA on DeepSeek-V3/Kimi-K2 unlocks fine-tuned variants of the leading open MoE models at production scale.
Importance: 3/5
Major baseline upgrade (CUDA 13 + PyTorch 2.11) + speculative decoding V2 as default — affects all SGLang inference deployments.
Sources
official
SGLang Releases — sgl-project/sglang