The company’s OmniHuman-1 multimodal model can create vivid videos of people speaking, singing, and moving with a quality “significantly outperforming existing audio-conditioned human video-generation methods”, the ByteDance team behind the product said in a paper. AI-generated images, videos and audio of real people are often referred to as deepfakes, a technology becoming more prominent in cases of fraud as well as more harmless uses for entertainment.
ByteDance has become one of the hottest AI companies in China. Its Doubao app is currently the most popular consumer-facing AI app in the country. It has not released the OmniHuman-1 to the public yet, but sample clips have gone viral.
One notable demo features a 23-second video of Albert Einstein delivering a speech. TechCrunch’s Kyle Wiggers described the app’s output as “shockingly good” and “perhaps the most realistic deepfake videos to date”.
The model highlights the advancements Chinese developers are making despite Washington’s efforts to curb the country’s AI progress. The launch follows OpenAI widening the release of its video-generation tool Sora, which was made publicly available to ChatGPT Plus and Pro users in December.