Google DeepMind unveils V2A, a new AI model that can generate soundtrack and dialogue for videos
Video generation models like Sora, Dream Machine, Veo and Kling are advancing at a rapid pace, allowing users to generate videos from text prompts. But, the majority of these systems are limited to silent videos. Google DeepMind seems to be aware of the problem and is now working on a new large language model that can generate soundtracks and dialogues for videos.
In a blog post, the tech giant’s AI research lab unveiled V2A (Video-to-audio), a new work-in-progress AI model that “combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.”