JoyAI-VL-Interaction: Open-Source 8B Real-Time VLM with Autonomous Turn-Taking
JD.com
JoyAI-VL-Interaction (arXiv 2606.14777) is an 8B VLM for continuous real-time video interaction: it watches a live video stream and autonomously decides when to speak or stay silent. Released with training recipe, time-aligned interaction data, and a fully deployable open-source system (pluggable ASR/TTS, memory, background agent API). Human raters preferred it over Doubao and Gemini in-app assistants across six real-world scenarios.
Why it matters
223 upvotes on HuggingFace Daily Papers. One of the first 8B models for always-on video streaming with autonomous turn-taking, closer to a real-time assistant than a chatbot, with full open-source release (model + data + system).
Importance: 4/5
223 HF upvotes + novel autonomous turn-taking VLM with full open-source release