vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention

MiniMax

Tools official 3 src. ~1 min

On June 12, 2026, the vLLM team published a blog post announcing day-0 serving support for MiniMax M3 — a 456B-parameter open-weight model with a 1M-token context window, native multimodal input, and MiniMax Sparse Attention (MSA) architecture (open weights released approximately June 10–11). Deployment requires the '--block-size 128' flag due to MSA's sparse/index cache requirements. AMD announced simultaneous day-0 support on Instinct GPUs. On Fireworks AI, M3 is available with pricing described as roughly 1/20th the cost of comparable closed models.

Why it matters

Day-0 inference engine support means practitioners can immediately run M3 locally or on-prem without waiting for framework updates. With Anthropic's top models offline, M3's 1M-context at MoE efficiency becomes a practical alternative for long-document coding and analysis pipelines.

Importance: 3/5

Day-0 vLLM + AMD support for a major open-weight frontier model; arrival timed with Anthropic model outage increases practical relevance.

vllm minimax inference open-weights long-context multimodal moe serving open-source release

vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention

Why it matters

Related items

Sources