MiniCPM-o 4.5: Real-Time Full-Duplex Omni-Modal AI on Edge Devices

OpenBMB / Tsinghua University

Research official + media 2 src. ~1 min

MiniCPM-o 4.5 is a 9B end-to-end model that achieves real-time full-duplex omni-modal interaction: it simultaneously processes continuous video and audio input while generating text and speech output without mutual blocking. Built on SigLIP2, Whisper-medium, CosyVoice2, and Qwen3-8B, it runs on edge devices with under 12 GB RAM and approaches Gemini 2.5 Flash performance on vision-language benchmarks.

Why it matters

First open-source model to achieve full-duplex omni-modal interaction at edge-device scale, demonstrating that simultaneous see-listen-speak capabilities competitive with Gemini 2.5 Flash can fit in a 9B open-weight model — significant for on-device AI assistant deployment.

Importance: 3/5

Novel open-weight on-device full-duplex model approaching closed frontier performance on VLM benchmarks.

Sources