
TL;DR
ByteDance announced Seedance 2.5 at its Beijing conference, generating 30-second native 4K video from up to 50 multimodal reference inputs.
ByteDance unveiled Seedance 2.5 on Tuesday at its Volcano Engine FORCE conference in Beijing, a video generation model that produces 30-second clips at native 4K resolution from a single prompt. The company skipped four intermediate versions entirely, jumping straight from its predecessor to signal what it described as a generational leap.
An enterprise beta is already live, with public launch targeted for early July. CEO Liang Rubo told the conference that climbing the AI summit is the company’s top priority, with its model-as-a-service business evolving into a foundational operation backed by long-term investment.
The headline upgrade is reference capacity: the model accepts up to 50 multimodal inputs, including images, audio clips, 3D white models, and style references, up from 12 in its predecessor. Those inputs give Seedance 2.5 far more granular control over style, motion, and composition than a text prompt alone.
The model generates at 4K natively rather than upscaling from a lower resolution, a distinction that matters for professional production pipelines. It supports 10-bit colour depth for smoother gradients and more room for post-production colour grading. ByteDance also claims 20 percent better prompt adherence, meaning fewer generations before a usable result.
Audio is now co-processed within the same latent space as visual signals, producing native synchronisation between onscreen actions and their corresponding sound effects. A new 3D white-box preview function lets creators generate low-fidelity animations before committing to a full-quality render. Together, the features position the model as a production tool rather than a novelty generator.
The announcement comes three months after ByteDance was forced to add watermarking and IP guardrails to Seedance 2.0 following cease-and-desist letters from Disney, Warner Bros Discovery, Paramount, and Netflix. A viral deepfake of Tom Cruise fighting Brad Pitt on a rooftop drew a formal complaint from the Motion Picture Association and a rebuke from SAG-AFTRA.
ByteDance paused the global rollout in mid-March and did not resume it through CapCut until late March, with face-blocking filters, C2PA watermarks, and copyrighted character detection in place. No timeline has been offered for making the new model available in the United States.
The competitive context has shifted dramatically since February. OpenAI shut down Sora in March after the video tool peaked at roughly one million users and reportedly cost about a million dollars a day to operate, generating just over two million dollars in total revenue.
Google’s Veo 3.1 has filled much of the vacuum, offering native 4K output, audio generation, and up to three reference images for style control. But the new ByteDance model substantially exceeds Veo’s reference input capacity, accepting 50 inputs to Veo’s three, a gap that matters for professional workflows.
The AI video generation market has fragmented rapidly, with Chinese models moving faster on production tooling than Western competitors. Third-party platforms like Reallusion’s AI Studio have already built professional pipelines around the predecessor model, and Runway’s fourth-generation tool has dropped out of the Artificial Analysis top 10.
Whether the new model can reach global markets without reigniting the copyright battles that stalled its predecessor remains the central question. ByteDance has the model, the distribution through CapCut’s 400 million monthly active users, and the vertical integration from generation to editing to sharing. What it does not yet have is a settlement with Hollywood, and every feature that makes the model more capable also raises the stakes of that unresolved conflict.
View original source — The Next Web ↗



