vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention
MiniMax
On June 12, 2026, the vLLM team published a blog post announcing day-0 serving support for MiniMax M3 — a 456B-parameter open-weight model with a 1M-token context window, native multimodal input, and MiniMax Sparse Attention (MSA) architecture (open weights released approximately June 10–11). Deployment requires the '--block-size 128' flag due to MSA's sparse/index cache requirements. AMD announced simultaneous day-0 support on Instinct GPUs. On Fireworks AI, M3 is available with pricing described as roughly 1/20th the cost of comparable closed models.
Why it matters
Day-0 inference engine support means practitioners can immediately run M3 locally or on-prem without waiting for framework updates. With Anthropic's top models offline, M3's 1M-context at MoE efficiency becomes a practical alternative for long-document coding and analysis pipelines.
Importance: 3/5
Day-0 vLLM + AMD support for a major open-weight frontier model; arrival timed with Anthropic model outage increases practical relevance.