DeepSeek Open-Sources DSpark: 57–85% Inference Speedup for V4 in Production
DeepSeek
DeepSeek and Peking University NLP Lab released DSpark (Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation), a framework that accelerates DeepSeek-V4-Flash inference by 60–85% and V4-Pro by 57–78% over the prior MTP-1 baseline. The framework is live in production for both V4 variants. The training and evaluation codebase DeepSpec is open-sourced under MIT on GitHub (`deepseek-ai/DeepSpec`), with HuggingFace model cards for DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark published.
Why it matters
A 57–85% inference speedup without quality loss is immediately practical for anyone running DeepSeek V4 at scale. Open-sourcing DeepSpec means the draft-model training recipe is available for the community to adapt to other base models.
Importance: 3/5
DeepSeek DSpark open-sourced with 57–85% inference speedup for V4 live in production; official + media coverage