Alibaba’s action was the company’s most recent attempt to introduce Sora-like video-generating capabilities, as Chinese businesses scramble to establish themselves in the AI video market.
The newest endeavor by the Chinese tech giant to build video artificial intelligence (AI) tools is Tora, a tool for creating videos that is built on OpenAI’s Sora.
A team of five Alibaba researchers published a paper last week that defined Tora, a video-generation system that uses OpenSora as its base. The South China Morning Post is owned by Alibaba.
According to the research published on the repository website arXiv, the Tora framework made a breakthrough based on the Diffusion Transformer (DiT) design, the revolutionary architecture that supports Sora, the text-to-video model announced by OpenAI in February.
The first “trajectory-oriented DiT framework for video generation,” as the researchers put it, guarantees that the generated motions accurately follow the given trajectories while simulating the dynamics of the real world.
The authors stated, “We customized OpenSora’s workflow to convert unprocessed videos into superior video-text pairings and utilize an optical flow estimator for trajectory extraction.”
The video clips, which range from a wooden sailing boat on a river to folks riding bicycles on the highway, are cited in the report as examples of things following predetermined paths. The researchers claim that Tora can produce films that are led by text, graphics, trajectories, or a mix of these.
The new tool’s release date was not specified by the researchers, who described the project as “ongoing.”
Amidst the rush of Chinese corporations to establish a presence in the AI video space, Alibaba’s move represented the latest attempt to develop Sora-like video-generating tools. Alibaba is situated in Hangzhou.
Chinese start-up Shengshu AI launched its text-to-video tool Vidu in July. After Zhipu AI and Kuaishou Technology, Shengshu AI is the most recent company in the nation to provide public access to such services. Registered users may create four- or eight-second movies.
That occurred only a few days after Zhipu AI, one of the four new “AI Tigers” in China, unveiled their Ying video creation model. This model can produce six-second video clips in around 30 seconds by accepting both text and picture suggestions.
But Alibaba has also made several moves in the area of AI film creation before this one. The business debuted Emote Portrait Alive, or EMO, an AI video-generation model, in February.
The model can create an animated avatar video with postures and facial emotions from a single still reference image and audio speech sample. It is called a “expressive audio-driven portrait-video generation framework.”
The study document did not specify if Alibaba’s self-developed family of big language models, Tongyi Qianwen, or EMO will be connected to Tora.
Read More:
- Microsoft Copilot Gets a Major Upgrade on Windows 11: Manage Settings, Use Power Automate, and More.
- Google’s Play Store Gets Smarter: AI-Powered App Highlights Revolutionize App Discovery.
- Adobe Joins the AI Race: New Tools Empower PDF Users.
- OpenAI Unleashes Sora: Can AI-Generated Videos Be Discerned from Reality?