#vision-language
- MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Images Technion research
- JoyAI-VL-Interaction: Open-Source 8B Real-Time VLM with Autonomous Turn-Taking JD.com research
- MemLens: Benchmark for Multimodal Long-Term Memory in Vision-Language Models NVIDIA research
- Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research
- Tencent releases HY-Embodied-0.5-X update for embodied agents Tencent models-llm