#moe
- DeepSeek V4: official open-source release with Day-0 adaptation for Huawei Ascend DeepSeek models-llm
- MiniMax Releases M3: Open-Weight Frontier Model with 1M-Token Context and MSA Architecture MiniMax models-llm
- NVIDIA Nemotron 3 Ultra: Open 550B MoE Model Now Available for Agentic Workloads NVIDIA models-llm
- MiniMax M3 Open Weights Released: 1M Context, MoE, Frontier Coding MiniMax models-llm
- Zhipu AI Open-Sources GLM-5.2 Under MIT License with 1M Token Context Zhipu AI models-llm
- Zhipu AI Releases GLM-5.2 Open Weights: 753B MoE with 1M-Token Context under MIT License Zhipu AI / Z.ai models-llm
- Zyphra Releases ZAYA1-8B: Open Reasoning MoE Model Trained on AMD Hardware Zyphra models-llm
- Lance: 3B Unified Multimodal Model for Understanding, Generation, and Editing (314 HF upvotes) ByteDance Research research
- JetBrains Open-Sources Mellum2: 12B MoE Coding Model for Multi-Model Pipelines JetBrains models-llm
- Cohere North Mini Code: 30B Apache-2.0 MoE Coding Model for Agentic Workflows Cohere models-llm
- Kimi K2.7-Code HighSpeed: 6× Throughput for Production Coding Agent Pipelines Moonshot AI models-llm
- Kwai Keye-VL-2.0: Open-Source 30B MoE Multimodal Model with 256K Context for Long Video Kwai research
- Moonshot AI Releases Kimi K2.7-Code: 1T-Parameter Open-Weight Coding Model with Vision Moonshot AI models-llm
- vLLM Adds Day-0 Support for MiniMax M3 Open Weights with 1M-Context Sparse Attention MiniMax tools
- Zhipu AI Releases GLM-5.2: 744B MoE with 1M-Token Context and Coding-First Design Zhipu AI models-llm
- Sber unveils Kandinsky 6.0 Image — flagship image generation model Sber image