The scary future of fake news: Perfect-quality CGI audio and video

The Economist has a rather disturbing article about how advances in “generative adversarial networks” will soon make it possible to create computer-generated audio and video footage that is indistinguishable from the real thing. The potential for spreading misinformation is obvious. The article offers some ways that such fakes could be spotted:

‘Yet even as technology drives new forms of artifice, it also offers new ways to combat it. One form of verification is to demand that recordings come with their metadata, which show when, where and how they were captured. Knowing such things makes it possible to eliminate a photograph as a fake on the basis, for example, of a mismatch with known local conditions at the time.

…Amnesty International is already grappling with some of these issues. Its Citizen Evidence Lab verifies videos and images of alleged human-rights abuses. It uses Google Earth to examine background landscapes and to test whether a video or image was captured when and where it claims. It uses Wolfram Alpha, a search engine, to cross-reference historical weather conditions against those claimed in the video. Amnesty’s work mostly catches old videos that are being labelled as a new atrocity, but it will have to watch out for generated video, too. Cryptography could also help to verify that content has come from a trusted organisation. Media could be signed with a unique key that only the signing organisation—or the originating device—possesses.’

However, it would be naive to think that these methods couldn’t be defeated with better CGI algorithms and through hacking file metadata and cryptographic keys.

And even if the “good guys” manage to forever stay one step ahead, we’re still rapidly approaching an era where the forgeries will be so good that unaided human eyesight and hearing won’t be sensitive enough to detect them, and humans will have to rely on machines to tell them what is real and what is fake (which is itself an interesting state of affairs from a philosophical standpoint, but that’s a talk for a different time). Something like a few fragments of aberrant computer code embedded in an otherwise perfect-looking fake video might be the only thing that reveals the lie. Considering the short attention span and low level of scientific and technological literacy in most countries, how could the computer forensic findings in such a case ever be explained to average people?

They couldn’t, which means belief or disbelief in accusations of forgery will twist in the winds of whatever preexisting biases each person has, which is how it is now. Americans will believe it when their government tells them a video originating in Russia is fake, and Russians who mistrust America will reflexively disagree and believe their own government’s claims it is genuine. The truth will of course be out in the open, but so abstruse that only a small minority will be able to see it clearly on their own.

Moreover, the ability to make perfect computer generated audio and video imitations of people could lead to disaster in crisis situations where the intended target lacks either the ability or the time to verify their authenticity using their own technology: Imagine a military battle where one side transmits false orders to the other, in the voice of the latter’s commander, or a situation where a hacker posing as a rich investor calls his stock broker and insistently tells him to trade some massive number of shares.

*Update (7/13/2017): Computer scientists at the University of Washington have developed a way to merge audio recordings of someone speaking with video footage of them, so their mouth appears to be moving in sync with the words, even though the audio and video are from two different sources. Here’s a sample of them manipulating a speech by Barack Obama:

Links

https://www.economist.com/news/science-and-technology/21724370-generating-convincing-audio-and-video-fake-events-fake-news-you-aint-seen