New Google AI Creates Audio From Video & Prompts

X

Google’s Deep Mind has showcased its latest results from its generative AI video-to-audio research. The system combines what is seen on screen with the user’s written prompt to create synced audio.

Called V2A AI, it can be paired with video-generation models such as Veo. It can create soundtracks, sound effects, and dialogue for on-screen action.

Deep Mind also claims it can generate “an unlimited number of soundtracks for any video input” by tuning the model with positive and negative prompts.

It works by encoding and compressing the video input, then leverages it to iteratively refine the desired audio effects from background noise, based on the user’s text prompt and the visual input.

The audio output is then decoded and exported as a waveform which can be recombined with the video input.

Screenshot 2024 06 20 111441 New Google AI Creates Audio From Video & Prompts

The user isn’t required to go in and manually sync the audio and video tracks, because the system does it automatically.

The Deep Mind team said, “By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes while responding to the information provided in the annotations or transcripts.”

The system isn’t entirely flaw-free yet. One; the output audio quality is dependent on the fidelity of the video input, and two; the system can mess up when video artifacts or distortions are present.

Deep Mind revealed syncing dialogue to the audio track is still a challenge as well.

“V2A attempts to generate speech from the input transcripts and synchronize it with characters’ lip movements. But the paired video-generation model may not be conditioned on transcripts. This creates a mismatch, often resulting in uncanny lip-syncing, as the video model doesn’t generate mouth movements that match the transcript.”

The team also revealed the system still has to undergo “rigorous safety assessments and testing” before it’s released to the public.

Stability AI also released a similar product last week, and ELevenLabs released its sound effects tool last month.

%name New Google AI Creates Audio From Video & Prompts

728x90 we see oled CN New Google AI Creates Audio From Video & Prompts
Emberton III BLACK 728x90 without CTA@2x New Google AI Creates Audio From Video & Prompts
Westan 728x90px New Google AI Creates Audio From Video & Prompts
JBL TourPro3 728x90 New Google AI Creates Audio From Video & Prompts
Uniden Channelnews SoloX July 2024 728x90 1 New Google AI Creates Audio From Video & Prompts
FA 979 HN MDF SG14 14gen 728x90 1 New Google AI Creates Audio From Video & Prompts
728x90 New Google AI Creates Audio From Video & Prompts
JB BUNDLE ESSENTIAL 2025 Banner 728x90px New Google AI Creates Audio From Video & Prompts
denon perl white 728x90 1 New Google AI Creates Audio From Video & Prompts
Olimpia Splendid Unico Cooling 728x90 1 scaled New Google AI Creates Audio From Video & Prompts
MaxRanger4K Leaderboard 728x90 New Google AI Creates Audio From Video & Prompts
728x90 New Google AI Creates Audio From Video & Prompts
MOTO 35058854 Ad Banners 02 728 x 90 New Google AI Creates Audio From Video & Prompts
WEB BANNERS5 scaled New Google AI Creates Audio From Video & Prompts
241211 SAV Ruark CNewsJan Leader New Google AI Creates Audio From Video & Prompts
Belkin Screen Protection 728 x 90 New Google AI Creates Audio From Video & Prompts
Litheaudio 728x90 New Google AI Creates Audio From Video & Prompts
Haier 728x90 1 New Google AI Creates Audio From Video & Prompts
hitachi banner 728x90 New Google AI Creates Audio From Video & Prompts
BlueAnt 4SQM PumpAirUltra 728x90px New Google AI Creates Audio From Video & Prompts
AU X8 Banner 728x90 New Google AI Creates Audio From Video & Prompts
ChannelNews AZ100 728x90 New Google AI Creates Audio From Video & Prompts

YOU MAY ALSO LIKE