One aspect of artificial intelligence that we have not paid much attention to is that it can create fake audio and video that is so real that it is difficult to distinguish. The advent of Photoshop has made us doubt what we see, and what happens when we can’t trust our senses.
The most recent example of AI’s audio-visual magic comes from the University of Washington, where researchers are creating a new tool that can take audio files and convert them into realistic lip movements. real, then turn them into videos. The result is a video of someone saying (something they didn’t say). It sounds complicated, but you can watch the video below to better understand this process.
You can see two parallel videos of former US president Barack Obama. The video on the left is the source to get the original audio, while the video on the right is a completely different speech that the researcher used an algorithm to create mouthpieces and put into the video. The fake video isn’t perfect (Mr. Obama’s mouth movements are a bit blurry – a common mistake with AI-generated images) but at first glance it looks very convincing.
The researchers say they chose Mr. Obama as an example because there are many high-quality videos of the former president, thus making learning the neural network easier. Researcher Ira Kemelmacher said they needed 17 hours of video data to track and copy mouth movements, but in the future that could be reduced to one hour.
The team behind this research says they hope to use it to improve video chat tools like Skype. Users can choose a video of them speaking to teach the software. When they need to talk to someone, the video can be automatically created and use their voice. This will help in situations when the network connection is not good or you want to save mobile data.
Of course, there are concerns that this tool will create fake videos, cause confusion, and spread fake news. Combining this tool with technology can create anyone’s voice with just a few minutes of sample audio. Similar research is also changing the way faces are detected in real time, creating 3D models of human faces from a few photos…
The research team from the University of Washington also understood that they would not use it for improper purposes, making it clear that they would only teach the neural network using Mr. Obama’s voice and video. “It’s impossible to take anyone’s voice and turn it into Mr. Obama’s video,” said professor Steve Seitz, “we will not do what one person says into someone else’s.” But in theory, this technology can create a voice on anyone’s face.
Watch the AI-generated video of Mr. Obama’s speech below.