Lance: 3B Unified Multimodal Model for Understanding, Generation, and Editing (314 HF upvotes)

ByteDance Research

исследования офиц. + СМИ 2 ист. ~1 мин

Lance is a 3B-active-parameter native unified multimodal model supporting image and video understanding, generation, and editing — trained from scratch. It employs a dual-stream mixture-of-experts architecture over shared interleaved multimodal sequences with modality-aware rotary positional encoding, substantially outperforming existing open-source unified models on image and video generation benchmarks while retaining strong comprehension.

Почему это важно

314 HuggingFace upvotes; demonstrates a lean 3B unified model trained with a careful multi-task recipe can rival much larger single-task specialists across the full understanding-generation spectrum

Важность: 3/5

314 HF upvotes; state-of-the-art unified understanding+generation in 3B params from ByteDance — challenges both specialist and larger unified models

Источники