DeepSeek Open-Sources DSpark: 57–85% Inference Speedup for V4 in Production

DeepSeek

Tools official + media 3 src. ~1 min

DeepSeek and Peking University NLP Lab released DSpark (Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation), a framework that accelerates DeepSeek-V4-Flash inference by 60–85% and V4-Pro by 57–78% over the prior MTP-1 baseline. The framework is live in production for both V4 variants. The training and evaluation codebase DeepSpec is open-sourced under MIT on GitHub (`deepseek-ai/DeepSpec`), with HuggingFace model cards for DeepSeek-V4-Pro-DSpark and DeepSeek-V4-Flash-DSpark published.

Why it matters

A 57–85% inference speedup without quality loss is immediately practical for anyone running DeepSeek V4 at scale. Open-sourcing DeepSpec means the draft-model training recipe is available for the community to adapt to other base models.

Importance: 3/5

DeepSeek DSpark open-sourced with 57–85% inference speedup for V4 live in production; official + media coverage

deepseek inference speculative-decoding open-source deepseek-v4 mit

Sources

official deepseek-ai/DeepSpec — MIT-licensed codebase for DSpark on GitHub

official DeepSeek-V4-Pro-DSpark model card — HuggingFace

media DeepSeek Releases DSpark — MarkTechPost