ByteDance Launches Seed-Audio 1.0: Unified Speech, Music, and Ambient Sound Generation
ByteDance
Announced alongside Seedance 2.5 at the Volcano Engine FORCE conference on June 23, Seed-Audio 1.0 generates multi-character dialogue with distinct voices, background music, sound effects, and ambient soundscapes in a single end-to-end pass of up to 2 minutes. It accepts text prompts and reference audio for voice style matching and cloning, and is available via ByteDance's Volcano Ark API integrated into CapCut, Jimeng, and Fanqie.
Why it matters
Seed-Audio 1.0 positions ByteDance as a full-stack generative media provider, unifying voice, music, and effects into one model — directly competing with ElevenLabs' multi-product suite and reducing the need for separate specialized tools in content pipelines.
Importance: 3/5
Unified speech+music+ambient audio in one model from a major Chinese lab with 180T daily token deployment reach