Software Engineers from Google's Research division have come up with a machine learning-based model that could revolutionize the way content is absorbed by automatically cleaning up audio in videos. That's according to a new blog post from the researchers, which highlights several video examples it has collectively called "Looking to Listen at the Cocktail Party." The idea here is to find ways to incorporate algorithms into machine learning that are based on a very specific human trait. Namely, it centers around humans' apparent ability to soften background noises and distractions when listening to another person in a crowded or noisy space – such as a cocktail party. The algorithms themselves are grounded in those audio-visual cues and, as the project currently stands, things seem to be going smoothly.
Far from only working with select videos that have multiple audio tracks, the researchers have been able to run ordinary, single track videos through their software. The software infers where specific audio is coming from based on visual input and separates that audio into multiple tracks while simultaneously cleaning it up – removing unwanted background noise so that the desired sounds can be heard clearly. As shown in the various videos that have been run through the system, the neural-network-driven tech looks very promising. In fact, Google hopes to be able to incorporate tools built off of this new model into future projects, including its current offerings.
It's not immediately clear how widespread the use of those would be in terms of the average consumer. The most obvious use case would apply to YouTube's content creators, who could effectively clean up the audio on their channels' videos much more quickly and easily. They could also feasibly capture content with fewer environmental limitations. That leads to another likely implementation of the tools that seems at least as likely. Audio isolation based on visual cues might be most useful in the company's enterprise offerings and particularly its video communications platforms. However, it's probably best to wait for an official feature announcement before jumping to any such conclusions. In the meantime, the video examples have been included below for anybody wanting to check them out.