ByteDance Launches Seed-Audio 1.0: Unified Speech, Music, and Ambient Sound Generation

ByteDance

Audio official + media 3 src. ~1 min

Announced alongside Seedance 2.5 at the Volcano Engine FORCE conference on June 23, Seed-Audio 1.0 generates multi-character dialogue with distinct voices, background music, sound effects, and ambient soundscapes in a single end-to-end pass of up to 2 minutes. It accepts text prompts and reference audio for voice style matching and cloning, and is available via ByteDance's Volcano Ark API integrated into CapCut, Jimeng, and Fanqie.

Why it matters

Seed-Audio 1.0 positions ByteDance as a full-stack generative media provider, unifying voice, music, and effects into one model — directly competing with ElevenLabs' multi-product suite and reducing the need for separate specialized tools in content pipelines.

Importance: 3/5

Unified speech+music+ambient audio in one model from a major Chinese lab with 180T daily token deployment reach

tts music-generation voice-cloning audio bytedance chinese-lab release

Sources

official ByteDance Seed Models — Official List

media ByteDance's Seedance 2.5 breaks the 30-second barrier (covers full FORCE conference suite) — The Decoder

media ByteDance unveils Seedance 2.5 (covers full conference releases) — The Next Web