Ess movemenhow algorithms create and prevent fake news fight the fires
and More
―
Noah Giansiracusa

Copyright © 2021 by Noah Giansiracusa
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
Cover designed by eStudioCalamar
Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, New York, NY 100043. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
Dedicated to my wife Emily and our parents: Bob, Dorothy, Andy, and Carole.
Contents
Chapter 2: Crafted by Computer 17
Chapter 3: Deepfake Deception 41
Chapter 8: Social Spread 175
Chapter 9: Tools for Truth 217
gence. He has dozens of publications in math and data science and has taught courses ranging from a first-year seminar on quantitative literacy to graduate machine learn-ing. Most recently, he created an interdisciplinary seminar on truth and lies in data and algorithms that was part of the impetus for this book.
Acknowledgments
Chapter 1 sets the stage by exploring the economics of blogging and online newspapers, with an emphasis on the dynamics that have led to a proliferation of low-quality journalism. Data, in the form of clicks and pageviews, has trans-formed the news industry, and you’ll see how fake news peddlers have taken advantage of this. Chapter 2 looks at a new development in our ongoing battle to understand what’s real and what’s not: fake journalists with untrace-able lifelike profile photos synthesized by AI, and entire articles written by AI
| 1 |
|---|
Pageview
The Data-Driven Economics of Online Journalism
and fake news. By tracing the financial incentives involved in the contemporary news cycle, I hope in this chapter to convey the alarming extent that data, unseen to most of us yet created by our actions and activities, is fundamentally shaping what we read every day and threatening the bulwark of traditional journalistic standards.
But why did I write “ostensibly” in the preceding paragraph? Well, there is somewhat of a Ponzi scheme dynamic at play here. Advertising revenue tends to be relatively low even for popular blogs, so the real ambition of most blogs, even if they don’t admit it, is to gain sufficient popularity and traffic that a larger organization will buy them out and incorporate the blog into its larger website in order to increase traffic—often so that the larger website can boost its odds of being bought by a yet larger organization.
For example, Nate Silver’s technical yet surprisingly popular blog on political polls was launched in 2008, brought into the New York Times in 2010, acquired by ESPN in 2013, then transferred to the sister property ABC News in 2018. Arianna Huffington’s groundbreaking general news blog the Huffington Post was founded in 2005 with a one million dollar investment and sold to AOL in 2011 for three hundred and fifty million—but, quite tellingly, at the time of this sale, its ad revenue was only thirty-one million dollars per year. This tenfold purchase price to annual revenue ratio is rather extreme and suggests that AOL was banking on continued long-term growth as well as other factors like the prestige of adding such a popular online newspaper and bringing
This blogger remuneration system is blatantly reductionist: the reader’s opinion of a blog post is irrelevant. In fact, it does not even matter whether the reader actually reads the post—once the link to a post is clicked, the pageview is recorded, and that’s all that counts. An unfortunate but largely predictable consequence has been the proliferation of clickbait: catchy, often trashy, headlines that encourage clicks rather than bespeaking quality journalism.1 A lengthy, methodically researched and fact-checked article provides no more financial value than a piece of vapid tabloid trash. This oversimplifies the situation as many readers follow certain blogs precisely because they consistently post high-quality articles, but many readers also click whatever stories are catchiest when scrolling through social media or news aggregators, and in these latter settings the name and reputation of the blog/organization is often a secondary factor in the decision to click—it is the headline that matters most.
An additional, and significant, dynamic is that blog posts tend to have short-lived pageview-generating lifespans. Consequently, bloggers and blogs, in their constant journey for increased traffic, are under intense pressure to produce as many posts as possible, as rapidly as possible. A traditional print newspaper had to produce content that filled one print edition per day; a cable news network has to produce content that fills twenty-four hours a day, three
Putting these observations all together, we see the perfect storm of conditions assaulting the foundations of journalism. Blogs and bloggers are almost all financially strapped, earning far less revenue than an outsider might expect, and so are in desperate need of more pageviews—whether to earn ad revenue directly or to raise the prospect of a lucrative buyout. This drives them to produce articles far too quickly, leaving precious little time to fact-check and verify sources. Even if they had time to fact-check, the pageview statistics they obsess over show that there is no real financial incentive for being truthful, as misleading articles with salacious headlines often encourage more clicks than do works of authentic journalism.
And let me be abundantly clear about this: it is the data-driven impetus of the blogging industry, and the vast oversimplification and distortion of multidimensional journalistic value caused by reducing everything to a single, simple-minded, superficial metric—the pageview—that is most responsible for this dangerous state of affairs. That some pageview-driven blogs thrive on thoughtful, methodical, accurate writing is truly remarkable in this market that is saturated with perverse incentives pressuring writers to engage in the exact opposite of these noble qualities. Let us all be thankful for the good blogs and good writing when we see it, for it is certainly out there but it struggles to rise above the ubiquitous clickbait filth pervading the internet.
propagation direction I haven’t yet directly addressed, despite claiming it is the one most responsible for our current morass of media mendacity, is the upward flow where stories start in small, typically special interest and/or geographically local blogs, and manage to work their way up the food chain, sometimes ending all the way at the top on national news sites. The questions we must ask here are: how and why does this happen, and why does this lead to less truthful news? The answers, as I next discuss, all essentially follow from the pageview economics of blogging.
|
|---|
A recent study4 by Harvard researchers on a disinformation campaign concerning mail-in voter fraud in the 2020 election details specific examples of fake news stories that originated in lower-tier publications with minimal editorial standards then launched upward through the system, spreading horizontally as they did so. For instance, a New York Post article from August 2020 relied on uncorroborated information from a single anonymous source, supposedly a Democratic operative, who claimed to have engaged in all sorts of voter fraud for decades to benefit the Democrats. Shortly afterward, versions of this story were put out by the Blaze, Breitbart, Daily Caller, and the Washington Examiner, and it eventually reached Fox News where it was covered on Tucker Carlson’s show and on Fox & Friends. The Harvard researchers even argue, though without too much quantitative evidence, that popular news outlets are more to blame for the viral spread of disinformation than the much-maligned social media—at least in the specific context of discrediting the results of the 2020 presidential election. I’ll revisit this topic in Chapter 8.
5wth of Crime and Otheciology Vol. 16, No. 3 (Nov. 1910), 342–371: .
|
|---|
There are some signs of hope, however. Just as the New York Times ushered in the print subscription model at the turn of the 20th century, the Wall Street Journal ushered in the online subscription model (the paywall) at the turn of the 21st century, a move that has been followed by the New York Times, the Washington Post, and many other highly reputed news organizations—and with great success at righting many of the earlier period’s wrongs, one might argue. Readers pay monthly fees to these organizations in order to access and support quality journalism.
6
Fortunately, even in the realm of freely available blogs, there are glimmers of light. For instance, in the early days of the COVID-19 pandemic, a lengthy, technical, and well-researched blog post8 ended up drawing over forty million reads and possibly played an important role in shifting the political discourse on how governments should respond to the pandemic. This article was the exact opposite of clickbait, and it shows that in the right context genuine substance is capable of drawing pageviews at astonishing numbers. Just as many environmentally or socially oriented consumers now choose where to shop based on the views and values of the companies they buy from, perhaps news consumers are ready to recognize pageviews as influential currency and spend them more meaningfully and thoughtfully.
Before you become too sanguine, however, I’d like to relate some specific tales of pageview journalism driving the spread of fake news and shaping our political reality.
site’s traffic came from the eight hundred thousand followers they acquired on Facebook during this period. At its peak, their monthly revenue reached upwards of forty thousand dollars. Prior to this venture, they were both unemployed restaurant workers.
|
|---|
reality. Blair invented far-right, and far-fetched, stories about “California instituting sharia, former president Bill Clinton becoming a serial killer, undocumented immigrants defacing Mount Rushmore, and former president Barack Obama dodging the Vietnam draft when he was nine.” While doing this, he realized that “The more extreme we become, the more people believe it.”
|
|---|
The disappearance of local newspapers has also been taken advantage of more directly through deliberate subterfuge. At the end of 2019, the Columbia Journalism Review (CJR), expanding on stories first reported elsewhere,
12 Priyanjana Bengani, “Hundreds of ‘pink slime’ local news outlets are distributing algorithmic stories and December 18, 2019:
13 New
The disappearance of genuine local news organizations—a significant loss in American media, triggered largely by the economics of the internet—has produced a vacuum that’s been filled in unscrupulous ways. This has created a more polarized nation and fanned the flames of fake news.
Summary
The intense competition for ad revenue also encourages journalists to take shortcuts by spending their time scouring blogs and papers for stories rather than doing direct investigations. This results in a vertical propagation in which fake news can slip into the system at the bottom in blogs or low-level newspapers with minimal editorial standards and then work its way up to the top.
The subscription model has been returning to some newspapers, in the online form of a paywall, but plenty of free papers supported by ad revenue remain. Moreover, a long-term consequence of the changing technological and economic landscape of journalism is the stark contraction of regional newspapers, which shows no signs of abating. Opportunistic political propagandists and professional fake news peddlers have been rapidly filling this void with deceptive papers that appeal to people’s old-fashioned trust in local news.
Artificial Intelligence Now Generates Headlines,
Articles, and Journalists
© Noah Giansiracusa 2021
event Fake News,
|
|---|
What makes deepfake profile photos so dangerous compared to simply grabbing a real person’s photo from the Web and relabeling it is that when a real photo is used, one can often find the original—thereby revealing the
2 Kashmir Hill and Jeremy White, “Designed
An Opinion Editor at the Times of Israel pointed out that even if Taylor’s articles themselves did not have much impact, the deepfake technology providing his fake persona with an untraceable profile photo already risks “making people in her position less willing to take chances on unknown writers.” In other words, the threat of deepfakes can be more powerful than the deepfakes themselves. We will see throughout this book that this situation is not uncommon: the disruption AI unleashes on society is caused not just by what has been done at large scale, but also by what nefarious activities could now potentially be achieved at scale. That said, deepfake-synthesized profile photos are not just an idle, theoretical threat faced by newspapers; since the Oliver Taylor incident, illicit use of this technology has spread rapidly, and, as experts initially feared, it is now a central part of many weaponized disinformation campaigns.
In December 2019, Facebook announced that it had removed a network of hundreds of accounts with ties to the far-right newspaper the Epoch Times that is an outgrowth of the new religious movement Falun Gong. This network included over six hundred Facebook accounts and dozens of
In September 2020, Facebook and Twitter both announced6 that they had removed a group of accounts that were spreading disinformation about racial justice and the presidential election aimed at driving liberal voters away from the Biden-Harris ticket. These accounts were operated by the Russian government, and they utilized deepfake profile photos. Facebook’s Head of Cybersecurity Policy said that “Russian actors are trying harder and harder to hide who they are and being more and more deceptive to conceal their operations.” The Russian agents set up a fake news site and recruited “unwitting freelance journalists” to write stories that were then shared by the fake social media accounts. This was the first time that accounts with
4 Davey Alba, “Facebook Discovers
established links to Russia’s notorious Internet Research Agency (which largely came into public awareness for its efforts to influence the outcome of the 2016 US election) were found to have used deepfake profile photos.
One month later, it was discovered that a fictitious persona using a deepfake profile photo was instrumental in a viral fake news conspiracy story about Joe Biden’s son, Hunter Biden. A sixty-four-page forged intelligence document supposedly linking Hunter Biden to shady business dealings in China was widely circulated in right-wing channels on the internet and by close associates of President Trump on social media. The author of this document was a Swiss security analyst named Martin Aspen who… did not exist. Disinformation researchers found7 that he was a fabricated identity who relied on a synthesized deepfake profile photo. The viral spread of this forgery helped lay the foundations for the ensuing developments in the fake Hunter Biden conspiracy theory, peddled most ardently by Rudy Giuliani, that gained a considerable following leading up to the 2020 presidential election.
Automated Headlines
In June 2020, it was announced8 that dozens of news production contractors at Microsoft’s MSN were sacked and replaced by AI. These contractors did not report original stories, but they did exercise some editorial control—they were responsible for “curating” stories from other news organizations (the vertical and horizontal propagation discussed in the previous chapter), writing headlines, and selecting pictures to accompany the articles. The contractors’ duties are now performed by algorithms that identify trending news stories and “optimize” content by rewriting headlines and adding photographs. It’s not clear what optimize means here, other than that the algorithm needs a concrete objective to strive for, and this is most likely the coveted pageview or one of its closely related cousins.
9“Adobe tests a 2020: .
|
|---|
The most powerful, flexible, and highly lauded AI product for generating text was developed by a research lab called OpenAI. This lab launched as a nonprofit in 2015 by Elon Musk and others with a billion-dollar investment; then in 2019 it added a for-profit component to its organization with another billion-dollar investment—this time from a single source: Microsoft. OpenAI has created a variety of AI products, but the one that has grabbed the most headlines is its text generation software GPT, an acronym for the technical name Generative Pre-trained Transformer that need not concern us.
GPT refers to a sequence of products: the original GPT came out in 2018 to limited fanfare; then a year later, GPT-2 was released10 and reached a whole new level of capability; and just one year after that, the current state-
One of the first and most important questions to ask about GPT is how similar the text it produces is to text written by humans. In August 2019, two scholars published a study12 in Foreign Affairs to see whether “synthetic disinformation,” in the form of nonfactual text generated by GPT-2, could “generate convincing news stories about complex foreign policy issues.” Their conclusion: while not perfect, it indeed can. Their study opens with a superficially plausible but entirely made-up passage generated by GPT-2:
11 Nick Statt, “Microsoft exclusively licenses OpenAI’s generation model,” The Verge, September 22, 2020:
The authors of this study wanted to test empirically how convincing passages such as this one really are. They fed GPT-2 the first two paragraphs of a New York Times article about the seizure of a North Korean ship and had it extend this to twenty different full article-length texts; by hand they then selected the three most convincing of the twenty GPT-2 generated articles (the paragraph above is taken from one of these generated texts). They conducted an online survey with five hundred respondents in which they divided the respondents into four groups: three groups were shown these hand-selected GPT-2 generated articles, while the remaining group was shown the original New York Times article.
They found that eighty-three percent of the respondents who were shown the original article considered it credible, while the percentage for the three synthesized articles ranged from fifty-eight percent to seventy-two percent. In other words, all three GPT-2 articles were deemed credible by a majority of their readers, and the best of these was rated only a little less credible than the original article. The respondents were also asked if they were likely to share the article on social media, and roughly one in four said they were—regardless of which version of the article they had read.
complete this to a short article of about two hundred words.14 A collection of GPT-3 generated articles of this form was combined with a collection of human-written articles of comparable length, and the OpenAI researchers claim that human readers had an average accuracy of fifty-two percent for determining which articles were GPT-3 and which were human. In other words, people did only marginally better than they would have just by randomly guessing with a coin toss.
Of course, the OpenAI researchers likely designed this study to produce as impressive results as possible. If they had used longer articles, the differences between human and machine would probably have emerged more prominently. Also, the human readers were low-paid contract workers recruited from Amazon’s crowdsourcing marketplace Mechanical Turk, so they were not a representative sample of the public, and they didn’t have any motivation to put much time or effort into the task—quite the opposite, actually, they get paid more the faster they click through their tasks. I wonder what the accuracy would have been if they had recruited, say, readers of the New York Times and gave them a small reward for each article that was successfully classified. Nonetheless, this experiment suggests that we’re already at the point where AI can write short articles that are at least superficially convincing to many readers, and the technology is sure to continue improving in the near future.
|
|---|
16 Karen Hao, “A college kid’s fake, AI-generated blog fooled tens of tho how he made it.” MIT Technology Review, August 14, 2020:
17020: . 18
19
Whether GPT-3 provides a significantly cheaper and faster way to produce effective fake news than the “old-fashioned” way of hiring low-paid freelancers on the internet (or teenagers in a Macedonia troll farm, as was the case in 2016) remains to be seen. The answer to this question—which largely depends on the price OpenAI charges customers—might determine how much GPT-3 will in fact fan the flames of fake news in the near future.
A glimpse into one of the surreptitious ways that GPT-3 is already being used was recently found on Reddit—and I strongly suspect similar behavior will soon spread to many other platforms and corners of online news/social media (if it hasn’t done so already without us noticing). Philip Winston, a software engineer and blogger, in October 2020 came across a Reddit post whose title was an innocuous but provocative question: “How does this user post so many large, deep posts so rapidly?” This post and the account of the user who made it were both later deleted, but Winston recalls22 that it essentially asked how a particular Reddit user was posting lengthy replies to many Reddit question posts within a matter of seconds. You probably already have a guess for the answer—and if so, you are correct.
|
|---|
Eager to resolve this matter, Winston found a subreddit discussing GPT-3 and posted in it asking if the experts there think this suspicious user is a bot powered by GPT-3. Within minutes, his suspicion was confirmed as someone there pinpointed the specific product derived from GPT-3 that was almost surely being used. It was called Philosopher AI, and by relying on this instead of GPT-3 directly, the user was able not just to gain ungranted access to the service but even to avoid the fees that a commercial user would ordinarily be required to pay. Winston alerted the developer of Philosopher AI of the situation, and the developer immediately blocked that particular user’s access.
Supervised Learning
We usually start with data in spreadsheet form, where the columns correspond to variables and the rows specify instances of these variables (in other words, each row is a data point). Each variable can be numerical (measuring a continuous quantity like height or weight or a discrete quantity like shoe size), or it can be categorical (in which each instance takes on one of a finite number of nonquantitative values, like gender or current state of residence). In the supervised learning framework, we first single out one variable as the target (this is the one we will try to predict, based on the values of the others); all the other variables are then considered predictors.24 For example, we might try to predict a person’s shoe size based on their height, weight, gender, and
state of residence (a numerical prediction like this is called regression), or we might try to predict a person’s gender based on their height, weight, shoe size, and state of residence (a categorical prediction like this is called classification).
There are a handful of popular supervised learning algorithms, most of which were largely developed in the 1990s. Each algorithm is based on assuming the overall manner in which the target depends on the predictors and then fine-tuning this relation during the training process. For instance, if you want to predict shoe size, call it y, based on height and weight, call those x1 and x2, and if you expect the relationship to be linear, then you can use a linear algorithm that starts with an equation of the form y = a1x1 + a2x2 + c, where a1, a2, and c are numbers called parameters that are “learned” in the training process. This means the algorithm is fed lots of rows of data from which it tries to deduce the best values of the parameters (“best” here meaning that, on average, the y values given by this linear formula are as close as possible to the actual values of the shoe size target variable).
been as much of an art as a science, and a holy grail in the subject has long been to find ways of automating this process. This brings us to our next topic in machine learning.
Deep Learning
After GPT-3 finished reading through its massive training data set of text a sufficient number of times, it locked the values of all its internal parameters and was then ready for public use (at least, for those granted access). Each user can input a block of text, and the algorithm will generate text to extend it as long as one would like. Internally, the algorithm takes the original input
25 See Footnote 18.
Very broadly, we want to feed an algorithm a large collection of photos of human faces and have it learn from these how to produce new faces on its own. It is absolutely astonishing that this is now possible. We don’t want to have to explicitly teach the algorithm that human faces generally have an oval shape with two ears on either side, two eyes, one nose in the middle, one mouth below that, etc., so we will rely on deep learning to automatically extract this high-level understanding directly from the data.
For text generation, we were able to piggyback off of supervised deep learning in a rather straightforward way—by reading text and attempting to predict each word as we go. For image generation, this doesn’t really work too well. While GPT-3 produces text that is quite convincing on a small scale (each sentence looks grammatical and related to the surrounding sentences), it tends to lose the thread of coherence over a larger scale (narrative contradictions emerge, or, for instance, in a story the villain and hero might spontaneously swap). This limitation often goes unnoticed by a casual reader. But large-scale coherence is absolutely crucial for image tasks such as synthesizing photographs of faces: a GPT-3 type approach would likely lead to globs of flesh and hair and facial features that seem organic in isolation but which constitute hideous inhuman monstrosities on the whole—the wrong number of eyes, ears in the wrong place, that kind of thing.
network (or GAN for short). The basic idea is to pit two self-supervised deep learning algorithms against each other. The first one, called the generator, tries to synthesize original faces—and it needs no prior knowledge, it really can just start out by producing random pixel values—whereas the second one, called the discriminator, is always handed a collection of images, half of which are real photos of faces and half of which are the fake photos synthesized by the generator. During the training process, the generator learns to adjust its parameters in order to fool the discriminator into thinking the synthesized images are authentic, but simultaneously the discriminator learns to adjust its parameters in order to better distinguish the synthetic images from the authentic ones.
The training process is quite delicate, much more so than for traditional supervised learning, because the two algorithms need to be kept in balance. But throughout the seven years that GANs have existed, progress in overcoming this and many other technical challenges has been rapid and breathtaking. The links provided earlier in this chapter give you the opportunity to see the outputs from state-of-the-art facial photo-generating GANs. And, as with essentially all topics in deep learning, there are no signs of this rapid progress abating. It is both exciting and frightening to think of what this technology might be capable of next.
researchers also included a new detector they developed internally aimed specifically at the most recent and popular deepfake photo synthesis system.
29
30 A demo ision request to gain access from the Allen Institute: source code has also been publicly released: .
31ews,” December 11, 2020: .
Two years ago, deepfake photos of nonexistent people first started being employed to cover the tracks of fake personas writing and sharing questionable news articles. Now, this is a standard technique in disinformation campaigns reaching all the way to Putin’s orbit, and it played a key role in the false Hunter Biden conspiracy that Trump and his allies tried to use to swing the 2020 election. These deepfake photos are cheap and easy to create, thanks to a recent deep learning architecture involving dueling neural networks. Google and Microsoft are both developing AI-powered tools for detecting when a photo is a deepfake, but this is a technological arms race requiring constant vigilance.
Deep learning also powers impressive language generation software, such as the state-of-the-art GPT-3—a massive system for autocompleting text that can convincingly extend headlines into full-length articles. Here, minor instances of illicit use have been uncovered, but a large-scale weaponized use in a disinformation campaign has not yet surfaced. It remains to be seen whether that’s because the developers of GPT-3 have kept access to the product closely guarded, or if it’s simply because fake news is so easy and fast to write by hand that the automation provided by GPT-3 doesn’t really change the equation. Only time will tell.
challenge is that, unlike its predecessor, GPT-3 is not open source: this makes it hard for researchers to build detection algorithms that are on par with GPT-3 itself. Once again, this is a technological arms race—but with the added challenge that training a state-of-the-art language generation algorithm costs many millions of dollars.
Throughout this chapter, the term “deepfake” referred to a synthetic photo. In the next chapter, we’ll animate these still photos and let them come to life by exploring deepfake movies and the fascinating role they play in the world of fake news.
Believing
In this era of fake news, the video was [...] showcasing an application of new artificial-intelligence technology that could do for audio and video what Photoshop has done for digital images: allow for the manipulation of reality.
have impacted politics and journalism, how the discord they sow relates to that of previous generations of image and video manipulation, and what legal and technological attempts are being made to rein them in.
Sounding the Alarm
1.
|
|---|
A fake composite photo distributed by Joseph McCarthy’s staff placed Senator Millard Tydings in apparent conversation with Earl Browder, head of the American Communist Party, in an effort to taint Tydings with Communist sympathies; some believe this played a key role in Tydings’ electoral defeat in 1950. In 2004, during Senator John Kerry’s campaign for the Democratic presidential nomination, a fake composite photo appearing to show him standing together with Jane Fonda at an anti-Vietnam demonstration surfaced and was even reprinted in a New York Times article about Kerry’s erstwhile antiwar activities; when the original photographs were presented, some right-wing opponents falsely claimed that the Kerry-Fonda photo was the authentic one and the original separate photos were the forgeries. Images are powerful; altering images is a method to alter reality and history.
A commonly employed technique to produce disinformation with either photographs or videos is simply to mislabel content: claim an event in one place instead happened elsewhere, that one group of people is instead a different group, etc. After Trump’s repeated fearmongering over a group of Honduran migrants traveling through Mexico to the United States in 2018, there was a viral post on Facebook showing the bloodied face of a Mexican police officer with the caption “Mexican police are being brutalized by members of this caravan as they attempt to FORCE their way into Mexico.”
2 Kevin Roose, “Debunk
3& violenc
Sometimes, shallowfake editing can be quite subtle and borderline. In November 2018, a video clip went viral showing a confrontation at a Trump press conference between CNN reporter Jim Acosta and a female White House aide. The clip shows the aide reaching for the microphone held in Acosta’s right hand, and as she nears it, Acosta’s left arm forcefully pushes away the aide’s extended arm in an apparent act of physical aggression. (Incidentally, the context of this confrontation was that Acosta challenged the president’s characterization of the migrant caravan moving through Mexico—the one mentioned just a few paragraphs earlier—as an “invasion,” and after some verbal sparring, Trump responded by angrily declaring “that’s enough” as an indication for the White House aide to regain control of the microphone.) This clip was originally tweeted by Paul Joseph Watson, an editor at InfoWars (a conspiracy theory channel I’ll come back to in the next chapter on YouTube), and it was soon provided an air of legitimacy and officiality when the White House press secretary Sarah Huckabee Sanders retweeted it as proof that Acosta “put his hands on a young woman just trying to do her job.” Not only that, but Sanders used this video as grounds for temporarily revoking Acosta’s White House press pass.
What is shallowfake about this Acosta clip? Some observers thought the clip appeared to be sped up slightly at the moment when Acosta’s arm is heading toward the aide’s outstretched arm, transforming an abrupt but not necessarily aggressive motion into more of a mild karate chop. Other viewers noted that the clip maybe wasn’t sped up but that it seemed to switch to a low frame rate animated GIF format when it zoomed in at the crucial moment—and the low frame rate made Acosta’s arm motion appear more sudden and forceful than it was in the original unedited video. Did Watson knowingly and purposely use the animated GIF format for this misleading effect, or was this an unintentional by-product? Animated GIFs are a popular video format on social media, but usually they are not the preferred format when high-quality details are important. CNN executives said the video was “actual fake news,” while Watson denied doctoring or editing the video other than zooming in.
One of the oldest tricks in the book is to quote someone out of context, and unsurprisingly this simple technique also rears its head in video editing where it can perhaps be viewed as another form of shallowfake. Leading up to the 2020 presidential election, Marjorie Taylor Greene—notorious at the time as an incoming US Representative whose campaign was largely based around the bizarre QAnon conspiracy theory—tweeted7 a video clip that she captioned with the following text: “Joe Biden Said On Video That Democrats Built the Biggest ‘Voter Fraud’ Operation in History. We’re seeing it on full display right now!” The clip originated from a Republican National Committee official and was quickly posted by Eric Trump and the White House Press Secretary, among others. In the clip, Biden indeed speaks of putting together “the most extensive and inclusive voter fraud organization in the history of American politics.” However, it is clear from the original full context that he was referring to an organization to prevent voter fraud, but of course this viral clip deliberately made it seem otherwise.
The stage is now set for the entrance of our familiar protagonist: AI.
|
|---|
Next, a simple but powerful and popular deep learning architecture called an autoencoder is used. In general, an autoencoder has multiple neural network layers that first get progressively narrower (these form the encoder portion of the autoencoder) and then progressively widen back to the original size (this second half is the decoder). This is trained on the self-supervised task of
9 Samantha ber 11, 2017: . 10 Yisroel Mey,” September 13, 2020: .
For concreteness, let’s pretend one of the numbers the autoencoder discovers measures how much the lips are smiling. When Gadot has a big smile, this number will be large—and since we use the same encoder for both Gadot and Cage, we know that the Cage decoder will interpret this large number as a large smile on Cage. This happens simultaneously for everything essential about their faces: what direction their eyes are looking, whether their mouth is open and how much and in what shape, etc. In this way, Cage’s pasted-in face will closely match the expression of Gadot’s original in each frame.
All that remains is to smooth over the edges around the face where Cage was pasted in, but that’s standard image processing, so we can just use ready-made general-purpose software for that. However, if you remember, I mentioned at the beginning of this chapter that the GAN architecture used in the last chapter for synthesizing deepfake photos is also used for deepfake movie editing like face-swaps—but there are no GANs in what I’ve described here so far! Indeed, some face-swap algorithms do not use GANs, but many of the
The vast majority of deepfakes do not stray far from that first lecherous appearance on Reddit: a report11 in 2019 found that ninety-six percent of deepfakes on the internet were nonconsensual face-swap pornography, most of which used the faces of female celebrities. At the time of the report, these videos had amassed over one hundred million views. Part of why these pornographic face-swaps use celebrities is simply the predilection of the audience, but it is also that there is an abundance of footage of celebrities readily available on the internet that provides ample training material for the deep learning algorithms. Importantly, however, as the technology continues to develop, less and less training data is needed to achieve the same level of verisimilitude.
A more recent report12 found that the number of deepfakes on the internet has been growing exponentially, doubling approximately every six months. The organization behind these reports, DeepTrace Labs, identified nearly fifty thousand deepfake videos by June 2020. Of the targets in these deepfake videos, 88.9% were from the entertainment industry, including 21.7% from fashion and 4.4% from sports. Only 4.1% of the targets were from the business world and 4% from politics, but both these latter figures represent increases in the percentages over previous years.
To help raise awareness of deepfakes and their potential to wreak havoc in politics, in 2018 BuzzFeed News worked with actor/writer/director Jordan Peele to produce a rather polished, compelling, and striking deepfake video13 in which Barack Obama said, among other things: “We’re entering an era in which our enemies can make it look like anyone is saying anything at any point in time—even if they would never say those things. So, for instance, they could have me say things like, I don’t know, [...] President Trump is a total and complete dipshit.” This video made a big splash when it came out—and it succeeded in bringing awareness of deepfakes to a much wider segment of the public.
This Obama video is a type of deepfake called a reenactment. You can think of this as a form of puppeteering, where here Obama was the puppet and Peele was the puppeteer. Peele was videotaped reading the script, then his mouth was clumsily pasted onto Obama’s, then a deep learning algorithm that had been trained on footage of Obama speaking was used to upgrade this simple copy-and-paste into a seamless blending of Peele’s mouth with the rest of Obama’s face—thereby animating Obama’s entire face according to Peele’s oral motions. Many reenactment algorithms use GANs. In short, the generator does the blending on the simple copy-and-paste video, and the discriminator compares the result to clips of authentic speech; in this way, the generator learns how to make its output look like authentic speech. In addition to the visual editing, the BuzzFeed team also used deep learning to transform Peele’s voice into a convincing acoustic impersonation of Obama.14 This project took roughly fifty-six hours of computational time and was overseen by a video effects professional. The deepfake video app used was FakeApp.
puppeteer that the user also provides to the program. These Samsung videos tend to retain more semblance to the puppeteer than deepfake methods that use training footage specific to the puppet.
|
|||||
|---|---|---|---|---|---|
| BuzzFeed | News, | May | 20, | 2018: | |
18.
19
On September 20, 2020, two ads were scheduled21 to air on Fox, CNN, and MSNBC in the DC region, one featuring a deepfake Vladimir Putin and the other featuring a deepfake Kim Jong-un. Both had the same message: America doesn’t need electoral interference because it will ruin its democracy all by itself. These ads were sponsored by a voting rights group and aimed to raise awareness of the fragility of American democracy and the need for Americans to actively and securely engage in the electoral process. The use of deepfakes here was not for deception, it was just to grab the viewers’ attention and startle people into recognizing the technologically fraught environment in which the 2020 presidential election was to take place. The deepfakes were face-swaps created using open source DeepFaceLab software. Both ads included the following disclaimer at the end: “The footage is not real, but the threat is.” At the last minute, the TV stations all pulled the ads and didn’t immediately provide an explanation for this decision. One can surely imagine a natural hesitation about wading into these delicate deepfake waters. In the end, the ads only appeared on social media.
Arguably, the first direct use of deepfake technology in a political election occurred in India in February 2020. One day before the Legislative Assembly elections in Delhi, two forty-four-second videos of Manoj Tiwari, the leader of the Bharatiya Janata Party (BJP), were distributed across nearly six thousand WhatsApp groups, reaching roughly fifteen million people. In both videos, Tiwari criticized the rival incumbent political leader. In one video, he spoke in English, while in the other video he spoke a Hindi dialect called Haryanvi. Both videos were, in a sense, deepfakes. Tiwari first recorded the video in Hindi, his native tongue. Then, in partnership with a political communications firm called The Ideaz Factory, an impersonator recorded the audio for the English and Haryanvi versions of the speech. Finally, a “lip-syncing” form of reenactment deepfake that had been trained on other footage of Tiwari speaking was used to match his lip movements to the new audio.
India was also the site of a much more unequivocally repugnant usage of deepfakes that occurred two years earlier—and while it is not directly related to an election, it still has strong political undercurrents and ramifications. Rana Ayyub was a thirty-six-year-old Indian woman, an investigative journalist, and a practicing Muslim. She said23 she was often seen as anti-establishment and that she has been called “the most abused woman in India.” She explained that anything she posted on Twitter would result in thousands of replies, much of it hateful and threatening. She tried to ignore the trolls and continue going about her job, telling herself that the online hate and threats “would never translate into offline abuse.” But in April 2018, that changed.
An eight-year-old Kashmiri girl had been raped, leading to widespread outrage across the country. The BJP (yes, the same one just discussed above) was the ruling political party at the time and responded by organizing a reactionary march in support of those accused of perpetrating this heinous act. Ayyub was invited to speak on the BBC and Al Jazeera about “how India was bringing shame on itself by protecting child sex abusers.” Shortly afterward, a male contact in the BJP sent Ayyub an ominous message: “Something is circulating around WhatsApp, I’m going to send it to you but promise me you won’t feel upset.” What she then saw was a pornographic movie in which she appeared to be the star. The video was a face-swap deepfake. In Ayyub’s own words: “When I first opened it, I was shocked to see my face, but I could tell it wasn’t actually me because, for one, I have curly hair and the woman had straight hair.
|
|---|
24 Ali Breland, “The Bizarre and Terrifying Case of the ‘Deepfake’ Vide .
25
In a striking parallel, something briefer but eerily similar happened in the United States just one year later. Friday, October 2, 2020, was one of the strangest and most confusing days in recent memory (and for 2020, that’s saying a lot). News broke in the morning that President Trump had tested positive for COVID-19, and in a matter of hours we found out that he wasn’t just positive, he was symptomatic—and then, that his condition was actually quite serious, he was going to be hospitalized. All the day’s events were shrouded in a veil of uncertainty and chaos largely caused by the lack of frank and transparent communication from the government. It was literally just weeks before one of the most important elections in American history, yet we did not know the true state of the president’s health, and suspicion quickly grew that things were much worse than the officials were telling us.
Then, at 6:31 p.m. that day, President Trump posted on Twitter an eighteen-second video address in which he said that he is heading to Walter Reed, but he reassured people that he thinks he is doing very well. The video looked
Let me turn now to one final example of real-world deepfakes in a political setting—this time showing a positive use of the technology. In July 2020, David France, an Oscar-nominated activist filmmaker, debuted on HBO a documentary called Welcome to Chechnya about the anti-LGBTQ purges that took place in Chechnya. He wanted to include interviews with survivors of these atrocities, but he knew that for their personal safety their identities must be concealed in the film—at the time, they were being hunted in their homeland and escaping the region through a network of safe houses. He felt the usual documentarian technique of blurring faces produced too much of an emotional disconnect between the speaker and the audience, so he instead used deepfake technology. The production team filmed individuals outside Chechnya, unrelated to the country’s purge, in a studio equipped with an array of cameras capturing their faces from many angles; then deep learning algorithms were used to blend these faces onto the faces of twenty-three Chechens in the film to provide them with new disguised faces—and hence anonymity. As reported28 in the New York Times, “In one of the film’s more
26 Tyler MacDonald, “Producers Speculate That
While convincing deepfakes still seem rare in the real world, both their usage and their quality have been accelerating swiftly. On March 10, 2021, the FBI issued an official alert29 boldly stating that “Malicious actors almost certainly will leverage synthetic content for cyber and foreign influence operations in the next 12-18 months,” and the alert specifies that deepfakes are the main form of synthetic content it is referring to here. If we develop tools for determining when videos are deepfakes, we could push back against these malicious efforts and also apply these tools the next time something like the Ali Bongo New Year’s address or Donald Trump Walter Reed clip or Myanmar confession arises. It is time now to look at the progress and challenges in developing such tools.
Detecting Deepfakes
algorithm,30 but it did not take long for deepfake creation algorithms to overcome this weakness and render this particular detection algorithm obsolete. A related but more recent approach31 that currently looks promising is to measure heartbeat rhythms and blood flow circulation, but it is only a matter of time before the deepfake creators learn how to get past this hurdle as well. That said, just because deepfake detection algorithms have a difficult task does not mean we shouldn’t bother trying; quite the opposite, it means we must move quickly and vigorously to stay on top of this ever-evolving challenge.
As noted32 in Scientific American, “Such ‘crafted’ deepfake videos are more likely to cause real damage, and careful manual post processing can reduce or remove artifacts that the detection algorithms are predicated on.”
33just released a database of 100,000 teach AI how to spot them,” MIT Technology Review, June 12, 2020:
34
A startup in the UK called Serelay developed an app that is similar to Truepic’s, except Serelay’s app does not store the full photo in its server, it only stores a small digital fingerprint of the photo obtained by computing about a hundred mathematical values for each image. One cannot reconstruct the full photo from this fingerprint, but the company claims36 that if even a single pixel in the photo has been modified, then the fingerprints will not match up. Of course, both the Truepic and Serelay services only work if one knows in advance that the validity of a particular photo might later be questioned—so while very useful in some realms, they do not address the ocean of questionable photos flowing through the rapid channels of social media every day. That said, one can envision a world in the not-too-distant future in which every smartphone by default uses a verification service like this, and then whenever someone posts a photo or video on a social media platform, the platform places a little check mark beside it if it passes the verification service.
35 Joshua
Senator Marco Rubio from Florida has spoken multiple times about the threats posed by deepfake technology and encouraged legislative action. Senator Ben Sasse from Nebraska in December 2018 introduced a bill aimed at regulating deepfakes—the first of its kind—but a day later, the federal government shut down over a budgetary impasse, and Sasse’s proposed bill expired by the time the government reopened. Next, in parallel to the June 2019 House hearing on deepfakes that opened this chapter, a Representative for New York’s ninth congressional district, Yvette Clarke, introduced a different bill on deepfakes, more extensive than Sasse’s.
Clarke’s bill—drafted in collaboration with computer scientists, disinformation experts, and human rights advocates—would require social media companies to better monitor their platforms for deepfakes and researchers to develop digital watermarking tools for deepfakes, and it would criminalize the malicious use of deepfakes that harm individuals or threaten national security. One of the advisers on the bill, Mutale Nkonde, a fellow at the Data & Society Research Institute, said38 the bill was unlikely to pass through Congress in its original form but felt it important to introduce the bill regardless in order to make the first serious step toward legislative regulation of deepfakes: “What we’re really looking to do is enter into the congressional record the idea of audiovisual manipulation being unacceptable.” While the bill did indeed stall,
in February 2021 Clarke said39 she’s planning to reintroduce a revised version of the bill that she felt would gain more traction due to the new political environment after the 2020 election and the fact that the pandemic has led to an increase in social media usage: “the conditions [are] ripe for actually passing some meaningful deepfake legislation.”
While regulation at the federal level has stalled in Congress so far, at the state level there have been some interesting developments. In October 2019, California signed a law40 making it a crime to maliciously distribute or create “materially deceptive” media about a political candidate within sixty days of an election. (A doctored photo or video is considered deceptive if a “reasonable person” would have a “fundamentally different understanding or impression” of it compared to the original version.) The term deepfake does not appear in the text of this law, but the law has been nicknamed the “California Deepfake Law,” and indeed it is directly inspired by deepfakes and the threat they pose to the state’s democratic systems.
effort to police itself. On January 28, 2020, the US House Ethics Committee released an official memo41 titled “Intentional Use of Audio-Visual Distortions & Deep Fakes” that includes the following text:
A number of people have argued that the biggest threat from deepfakes is not the direct deception they are capable of—it is the general erosion of trust they lead to in society and the cover they provide to nefarious individuals now to plausibly deny damning videographic evidence by simply crying deepfake.
41
As you surely remember, just one month before the 2016 presidential election, the Washington Post published an article accompanied by the now-notorious “Access Hollywood tape” from 2005 in which Donald Trump makes extremely lewd comments about women in off-camera audio that was recorded presumably without his knowledge. This story broke just two days before one of the presidential debates, and Trump responded by admitting he made the remarks caught on tape and apologized for them but also attempted to minimize their significance as “locker room banter.” One year later, Trump quite bizarrely and brazenly started claiming43 the audio on that tape was fake and that he didn’t say the words we heard.
Responding to this assertion in a CNN interview with Anderson Cooper, the soap opera actress Arianne Zucker who was the subject of some of Trump’s vulgar comments in the Access Hollywood tape had this to say: “I don’t know how else that could be fake, I mean, unless someone’s planting words in your mouth.” Access Hollywood responded as well: “Let us make this perfectly clear, the tape is very real. He said every one of those words.” Nonetheless, Trump reportedly said44 in multiple private conversations that he’s not sure if it was really him in the tape, and in January 2017 he told a senator he was “looking into hiring people to ascertain whether or not it was his voice.”
Summary
Deepfake video editing is a wide range of methods for modifying video clips to change the words people say and the people who say them. It is powered by deep learning, most commonly the GAN architecture in which two algorithms are pit against each other and through the data-crunching training process the generator learns to routinely fool the discriminator. This technology first appeared in 2017 when it was used to make nonconsensual pornography, and it now threatens society’s ability to discern the truth. Conspiracy theorists call legitimate videographic evidence (such as George Floyd’s murder by the police) into question by claiming it is deepfake, and corrupt politicians are now granted a powerful tool: they can dismiss incriminating clips as deepfake. Meanwhile, innocent journalists and politicians have had their reputations tarnished when their faces were deepfake swapped into sexual clips. Algorithmically detecting deepfakes has proven challenging, though there is sustained effort in that realm and some glimmers of hope. Legislative attempts to limit the spread of deepfakes by regulating their usage have so far stalled at the national level; at the local level, there has been some concrete action, but the impingement of free speech they necessitate leaves their constitutionality in question. This chapter was all about the algorithms used to edit videos; in the next chapter, I turn to another algorithmic aspect of videos: YouTube recommendations.
Recommendations
As far-right and conspiracy channels began citing one another, YouTube’s recommendation system learned to string their videos together. However implausible any individual rumor might be on its own, joined together, they created the impression that dozens of disparate sources were revealing the same terrifying truth.
recommendation algorithm drives the majority of watch time on the site, so understanding how it works is crucial to understanding how YouTube has pushed viewers toward outlandish conspiracy theories and dangerous alt-right provocateurs. This chapter takes a close look at how the recom-mendation algorithm has developed over the years, how it behaves in practice, how it may have influenced elections and political events around the world, how the company has responded to criticism, and how it has tried to moderate the content it hosts.
Growing Chorus of Concern
1 Craig Silverman, “This Pro-Trump YouTu
2
“YouTube is something that looks like reality, but it is distorted to make you spend more time online. The recommendation algorithm is not optimizing for what is truthful, or balanced, or healthy for democracy.” This was said by Guillaume Chaslot, a former Google AI engineer who worked on YouTube’s recommendation algorithm. “On YouTube, fiction is outperforming reality,” Chaslot continued.5
“Bellingcat, an investigative news site, analyzed messages from far-right chat rooms and found that YouTube was cited as the most frequent cause of members’ ‘red-pilling’—an internet slang term for converting to far-right beliefs. A European research group, VOX-Pol, conducted a separate analysis of nearly 30,000 Twitter accounts affiliated with the alt-right. It found that the accounts linked to YouTube more often than to any other site.” This was written by Kevin Roose of the New York Times in his investigation into how YouTube radicalizes people.6
6, June 8, 2019: 7 019: .
|
|---|
In 2018, it was revealed10 by YouTube’s Chief Product Officer that seventy percent of the total time users spend watching YouTube videos comes from this fourth category, the recommended videos.
10 Joan Solsman, “YoET, January 10, 2018: . 11 See Footnote 2.
|
|---|
2012: From Views to Watch Time
You saw in Chapter 1 that online journalism has adopted the pageview as its primary currency: the single metric that determines ad revenue and defines success. In the early years of YouTube, the success of a video was similarly measured by the number of views it received, but there was a big problem with this: ads are dispersed throughout videos so users who leave videos early do not see all the ads. Two videos with the same number of views might generate very different amounts of ad revenue if one was getting users to watch longer and therefore see more ads. This suggests that the combined amount of time all users spend on a video (called watch time) is a better proxy for the value of a video than the number of views. And keep in mind it’s not just content creators who earn money from ad revenue—YouTube’s corporate profits are from ad revenue, so YouTube the company needs users to watch videos for as long as possible. Accordingly, in 2012, YouTube made a fundamental and lasting change to its recommendation algorithm: instead of aiming to maximizing views, it would aim to maximize watch time.
Just a month after the switch to watch time, YouTube made another key change: it started allowing all video creators—not just popular channels vetted by YouTube administrators—to run ads in their videos and earn a portion of the ad revenue. Thus, 2012 was an important year for YouTube in terms of both algorithmic and economic developments.
2015: Redesigned with Deep Learning
Jim McFadden, the technical lead for YouTube recommendations, commented13 on this shift to deep learning: “Whereas before, if I watch this video from a comedian, our recommendations were pretty good at saying, here’s another one just like it. But the Google Brain model figures out other comedians who are similar but not exactly the same—even more adjacent relationships.” And it worked: aggregate watch time on YouTube increased twentyfold in the three years that followed Google Brain’s involvement. However, one significant issue with deep learning is that it trades transparency for performance, and the YouTube recommendation algorithm is no exception. As McFadden himself put it: “We don’t have to think as much. We’ll just give it some raw data and let it figure it out.”
The Google Brain deep learning algorithm starts by whittling down the vast ocean of videos on YouTube to a small pool of a few hundred videos the user might like based on the user’s watched video history, keyword search history, and demographics. The demographic data include the geographic region the user is logged in from, the type of device they are using, and the user’s age and gender if they have provided that information. The next step is to rank this small pool of videos from most highly recommended to least highly recommended, so that the algorithm can offer the videos it deems most likely to appeal to the user at the given moment. This ranking process relies on the user-specific predictors mentioned above but also a few hundred video-specific predictors, including details on the user’s previous interactions with the channel the video is from—such as how many videos the user has watched from this channel and when was the last time the user watched a video from this channel. To prevent the user from being shown the same list of recommended videos every time, the algorithm demotes the rank of a video whenever it is offered to the user and the user does not watch it.
2018: Deep Reinforcement Learning
The recommendation algorithm must strike a difficult balance between popularity and freshness. If it only recommends videos with large watch times (or other indicators of popularity such as views, upvotes, comments, etc.), then it will miss out on new content, on fresh videos that haven’t yet gone viral but which might have the potential to do so. The recommendation algorithm must also strike a delicate balance between familiarity and novelty in the videos it selects for each individual user. It wants to recommend similar videos to the ones each user has already watched, since that’s the most accurate guide to that user’s personal tastes and interests, but if the videos are too similar to the ones the user has already seen, then the user might become bored and disinterested. The next big innovation brought in by the Google Brain team, in 2018, helps address these countervailing factors.
But what does this have to do with YouTube? Well, in 2018, the Google Brain team brought reinforcement learning to the recommendation algorithm. Here, the “game” the computer plays is to keep each user watching videos as long as possible, so the reward function is something like the total amount of watch time each user spends in a sequence of up next recommendations before leaving the site.
Prior to reinforcement learning, the recommendation algorithm would choose and then rank the up next videos by how long it estimates the user will watch each one individually. This is like playing chess by only looking one move ahead. With reinforcement learning, the algorithm develops long-term strategies for hooking the viewer. For example, showing someone a short video that is outside their comfort zone might only score a couple minutes of watch time, but if doing this brings the viewer to a new topic they hadn’t previously been exposed to, then the user might get sucked into this new topic and end up sticking around longer than if they had stayed in reliable but familiar territory. This is a long-term strategic aspect of YouTube recommendations, and it helps illustrate how reinforcement learning is well suited to tackle the delicate balances discussed earlier between popularity and freshness and between familiarity and novelty.
YouTube engineers have released some other technical papers on video recommendation systems (e.g., another one18 in 2019), but we don’t know which if any of these have been absorbed into the official YouTube algorithm and which are just publications providing lines on the resumes of the authors. In many ways, the 2016 deep learning paper was the last major close-up view that has been offered from the inside.
19 They saecause he was agitating for change within the company.
20 See Footnote 2.
YouTube in Brazil
In Brazil, the fourth largest democracy on the planet, YouTube has become more widely watched than all but one TV channel.21 Jair Bolsonaro, the country’s authoritarian far-right president, not long ago was a fringe figure lawmaker with little national recognition peddling conspiracy videos and extremist propaganda on his YouTube channel. In a relatively short span of time, his YouTube channel grew massively in subscribers and provided him with a sizable cult following. He rode this wave of YouTube popularity to presidential victory in 2018, and he wasn’t alone. A whole movement of far-right YouTube stars ran for office along with Bolsonaro; many of them won their races by historic margins, and most of them now govern through YouTube the way Trump did with Twitter up until the end of his presidency.
political transformation. In response to the New York Times investigation, a spokesperson for YouTube said that the company has since “invested heavily in the policies, resources and products” to reduce the spread of harmful misinformation. Well, that’s reassuring.
Having confirmed that YouTube was indeed recommending far-right propaganda leading up to and following Brazil’s 2018 election, two important questions need to be addressed next: did the recommendations actually convert people ideologically, and why did the recommendation algorithm—designed by generally left-leaning Silicon Valley computer scientists—favor far-right videos? I am not aware of a rigorous quantitative study addressing the first of these two questions; the anecdotal evidence, however, and firsthand experience of members working inside the political movements, while not unequivocal, suggest the answer is yes. Let me turn to this now.
23 Ryan Broderick, “YouTubers Will Enter Politics, An
|
|---|
It’s certainly not true that the political videos YouTube recommended were exclusively far right—it’s just that they were vastly, disproportionately so. One driving factor for this is essentially psychological: some of the emotions that tend to draw people in to content and keep them tuned in (thereby maximizing the parameter YouTube’s algorithm was designed to optimize, watch time) are fear, doubt, and anger—and these are the same emotions that right-wing extremists and conspiracy theorists have relied on for years. In addition, many right-wing commentators had already been making long video essays and posting video versions of their podcast, so YouTube’s switch from views to watch time inadvertently rewarded YouTube’s far-right content creators for doing what they were already doing.
In other words, it may not be the case that far-right provocateurs strategically engineered their message to do well in YouTube’s recommendation system—and it’s almost certainly not the case that YouTube deliberately engineered its algorithm to support far-right content. Instead, the two seem to have independently reached similar conclusions on how to hook an audience, resulting in an accidental synergy. Google Brain’s deep learning framework
The recommendation algorithm didn’t just increase the viewership of fake news and conspiracy theories on YouTube, it also provided an air of legitimacy to them. Even if a particular conspiracy theory seems blatantly implausible, as YouTube recommends a sequence of videos from different creators on the same topic mimicking each other, the viewer tends to feel that all signs are pointing to the same hidden truth. Debora Diniz, a Brazilian women’s rights activist who became the target of an intense right-wing YouTube conspiracy theory smear campaign, said24 this aspect of the algorithm makes it feel “like the connection is made by the viewer, but the connection is made by the system.”
This phenomenon can be seen in topics outside of politics as well. Doctors in Brazil found that not long after Google Brain’s 2015 redesign of the recommendation algorithm, patients would come in blaming Zika on vaccines and insecticides (the very insecticides that in reality were being used to limit the spread of the mosquito-borne disease). Patients also were increasingly refusing crucial professional medical advice due to their own “YouTube education” on health matters. The Harvard researchers involved in the New York Times investigation of YouTube in Brazil found that “YouTube’s systems frequently directed users who searched for information on Zika, or even those who watched a reputable video on health issues, toward conspiracy channels.” A YouTube spokesperson confirmed these findings and said the company would change how its search tool surfaced videos related to Zika (a band-aid on a bullet wound, in my opinion). Why did people create these harmful medical disinformation videos in the first place, and why did YouTube
Playing the Game
Remember from the timeline of YouTube’s algorithm how in 2018 Google Brain brought in a machine learning technique called reinforcement learning—more commonly used for playing games—that allows the recommendation algorithm to develop long-term strategies for sucking in viewers? At an AI conference in 2019, a Google Brain researcher said25 this was YouTube’s most successful adjustment to the algorithm in two years in terms of driving increased watch time. She also said that it was already altering the behavior of users on the platform: “We can really lead the users toward a different state, versus recommending content that is familiar.” This is a dangerous game to play when that different YouTube state is a chain of far-right conspiracy videos which might ultimately have led to a different political state for all citizens of Brazil—a xenophobic, anti-science, authoritarian state. “Sometimes I’m watching videos about a game, and all of a sudden it’s a Bolsonaro video,” said26 a seventeen-year-old high school student in Brazil, where the voting age is sixteen.
Stirring Up Electoral Trouble in 2020
On election day in 2020, hours before any of the polls had closed, eight videos out of the top twenty in a YouTube search for “LIVE 2020 Presidential Election Results” were showing similar maps with fake electoral college results.28 One of the channels in this list had almost one and a half million subscribers, and several of the channels were “verified” by YouTube. The top four search results for “Presidential Election Results” were all fake. Curiously, most of the YouTube channels coming up in election day searches for election results were not even affiliated with political or news organizations—they were just people opportunistically using the election to snag some easy ad revenue.
300: .
|
|---|
In addition to direct viewership, another way that YouTube is shifting political discourse in America is through a sort of ripple effect where YouTube serves up sizable audiences to various individuals who then reach even more massive mainstream audiences on traditional media outlets. The following story illustrates this dynamic.
In the first weeks of the coronavirus pandemic in January 2020, a medical researcher in Hong Kong named Dr. Li-Meng Yan had, based on unsubstantiated rumors (which it later turned out were totally fabricated and false), started to believe that the virus was a bioweapon manufactured by the government in mainland China and deliberately released on the public. To spread a message of warning, she reached out to a popular Chinese YouTube personality, Wang Dinggang, known for criticizing the Chinese Communist Party. Dr. Yan portrayed herself as a whistleblower and anonymous source to Dinggang,
The same Chinese YouTube host, Dinggang, is also believed33 to have been the first to seed baseless child abuse rumors about Hunter Biden—rumors that spread from his YouTube channel to InfoWars and then to the mainstream press in the New York Post. In this way, YouTube provides a powerful entry point for the dangerous vertical propagation phenomenon of fake news studied in Chapter 1.
Researchers found34 that people who believe in conspiracy theories tend to rely more heavily on social media for information than do the less conspiratorially inclined segments of the population. Specifically, sixty percent of those who believe that COVID-19 is caused by radiation from 5G towers said that “much of their information on the virus came from YouTube,” whereas this figure drops to fourteen percent for those who do not believe this false conspiracy. People who ignored public health advice and went outside while having COVID symptoms were also much more reliant on YouTube for medical news and information than the general public.
34racies,” BBC News, June 17, 2020: .
2. Select at random one of the top five “up next” recommended videos.
3. Repeat the previous step four times.
|
|---|
might mask a lot of important variation, and it says nothing about users who specifically seek out news-related content among their recommendations. There are other ways of probing the recommendation algorithm, as you’ll soon see, and a finer-tooth comb reveals a much darker story.
36 See Footnote 2.
|
|---|
While Franchi is a professional (of sorts) fully devoted to his channel, the Guardian found that even the “amateur sleuths” and “part-time conspiracy theorists,” who typically received only a few hundred views on their videos,
It should be noted that tracking of comment activity provides only a limited and possibly distorted window into YouTube viewership because the large majority of viewers do not comment (and one cannot presume that the commenters are representative of the full population of viewers), and also many comments are from viewers refuting the claims in the video and debating with, or simply trolling, the supporters. This research project did not consider the content of comments. Moreover, the fact that commenters flow to further
37 Manoel Horta Ribeiro et al., “Auditing Radicalization Pathways on YouTube,” Proceedings and Transparency, 131–141: .
Contradictory Results
Another empirical study39 of the recommendation algorithm was conducted in 2019 and, like the Pew random walk investigation, found that YouTube’s recommendation algorithm did the opposite of radicalize—it “actively discourages viewers from visiting radicalizing or extremist content. Instead, the algorithm is shown to favor mainstream media and cable news content over independent YouTube channels.” The authors, Ledwich and Zaitsev, even assert “we believe that it would be fair to state that the majority of the views are directed towards left-leaning mainstream content.”
|
|---|
rebutted41 this rebuttal. It’s hard to know what to make of this debate; both sides appear to raise valid points while reaching diametrically opposed conclusions.
41 Anna Zaitsev, “Response to further critique on our paper ‘Algorithmic Extremism:
42Guillaume Chaslot, and Hany Farid, “A Longitudina iracy Videos,” preprint, March 6, 2020: .
The Role of Viewing History
One potentially significant limitation with all the data-driven investigations of the YouTube recommendation algorithm discussed in this chapter—the Harvard team working for the New York Times to explore the situation in Brazil, the Pew random walk empirical study, the Chaslot political database analyzed by the Guardian, the Ledwich-Zaitsev paper, and the Chaslot-Berkeley longitudinal study—is that they all rely on a logged-out anonymous user. This means a user without prior viewing history, search history, or demographic information.
Another Algorithmic Misfire
The recommendation algorithm isn’t YouTube’s only algorithmic culprit when it comes to spreading fake news and disinformation. Another company algorithm automatically puts together videos on the platform into channels it creates on various topics. For instance, CNN posts all its videos to its official YouTube channel, but YouTube’s internal algorithm also creates channels for each of the network’s popular shows. The problem is that not all the videos this algorithm finds are authentic. Well, sort of.
46 Jana Winter, “Exclusive: FBI document warns c
47suit,” Ars Technica, February 26, 2020:
|
|---|
increase in incorrect video removals. Around eleven million videos were taken down during this six-month period, which is roughly twice the normal rate. Over three hundred thousand of these takedowns were appealed, and half of the appeals were successful. YouTube’s Chief Product Officer (the same one who earlier in this chapter revealed that seventy percent of YouTube watch time comes via the recommendation algorithm) revealingly said that the machine learning approach to content moderation “couldn’t be as precise as humans” so the company decided to err on the side of caution during this period. He also pointed out that of these eleven million videos, more than half were removed before a single actual YouTube user watched the video. “That’s the power of machines,” he said.
|
|---|
What the current state of the matter is, and whether YouTube’s internal adjustments to the algorithm have been enough to keep pace, is unclear. What is clear is that by allowing this misinformation arms race to take place behind closed doors in the engineering backrooms of Google, we are placing a
53 Kevin Roose, “YouTube Cracks Down on , October 15, 2020:
There are some grassroots efforts to counterbalance the far-right content on YouTube by creating far-left content that indulges in the same sensationalist techniques that seem to have resulted in large view counts and watch times propped up by the recommendation algorithm. One of the main instances of this is a group called BreadTube, whose name is a reference to the 1892 book The Conquest of Bread by Russian anarchist/communist revolutionary Peter Kropotkin. While I can understand the short-term desire to rebalance the system this way by hijacking methods from the alt-right movement, for the long-term outlook this really does not seem like a healthy way to correct the problem that YouTube has unleashed on society. But without resorting to such extreme methods, we can either trust YouTube to keep fixing the problem on its own in secret or we can push for more transparency, accountability, and government regulation; my vote is for the latter.
One reason for this view is that even if YouTube voluntarily takes a more proactive stance in the fight against fake news, other video platforms will step in to fill the unregulated void left in its place. In fact, this is already happening: Rumble is a video site founded in 2013 that has recently emerged as a conservative and free speech–oriented alternative to YouTube (similar to the role Parler plays in relation to Twitter). The founder and chief executive of
Rumble said57 that the platform has been on a “rocket ship” of growth since summer 2020 that has only accelerated since the election. He said the platform “prohibits explicit content, terrorist propaganda and harassment,” but that it was “not in the business of sorting out misinformation or curbing speech.” This suggests to me that we should not just leave it up to individual companies to decide how and how much to moderate their content—we need a more centralized, cohesive approach in order not to fall hopelessly behind in the fight against fake news. I’ll return to this discussion in a broader context in Chapter 8.
While waiting for legislative efforts to address this problem, it is important in the meantime to look carefully at the technical tools we have at our disposal. In the next chapter, I’ll explore whether recent lie detection algorithms powered by machine learning could be used to detect disinformation in online videos.
| 5 |
|---|
Polygraph
Can Computers Detect Lies?
algorithmic reinvention of the polygraph. How well it works and what it has been used for are the main questions explored in this chapter. To save you some suspense: this approach would create almost as much fake news as it would prevent—and claims to the contrary by the various companies involved in this effort are, for lack of a better term, fake news. But first, I’ll start with the fascinating history of the traditional polygraph to properly set the stage for its AI-powered contemporary counterpart.
The United States entered World War I in 1917, and Marston saw an important application of their incipient technology: catching spies. He pitched this idea to various government officials, and he succeeded in convincing the National Research Council to form a committee to consider “the value of methods of testing for deception” that he proposed. Two weeks later, Marston enthusiastically wired to the committee chair the following brief note: “Remarkable results thirty deception tests under iron clad precautions letter following.” The letter that followed elaborated the experiments Marston had conducted with colleagues. The first batch of subjects in these tests were primarily women at Harvard sororities, and a second batch of subjects came
1l, 2003: .
As Marston neared the completion of his studies at Harvard, his correspondence with the National Research Council turned sharply from simply requesting financial support for his research to securing employment directly within the government. This development appears to have been strongly motivated by the recognition that finishing his degrees meant no longer being a student—and hence being eligible for the wartime draft. Although he always envisioned himself as a university professor, a governmental research position was unquestionably more to his liking than armed service.
He was successful in this pursuit of government employment: by 1918, he was working for a medical support unit within the War Department (the more honestly named agency that in 1949 evolved into the Department of Defense). In this position, he continued his experiments on the lie detector, and he claims to have achieved ninety-seven percent accuracy in tests under-taken in the Boston criminal court using his systolic blood pressure device. He later wrote about using his device on spies during the late teens and throughout the 1920s, and in 1935 J. Edgar Hoover, as director of the FBI, officially inquired into Marston’s work, but there are no surviving public details on Marston’s work on espionage. Marston eventually segued back into academia and lived out his remaining years as a professor; this provided him the intellectual freedom to take his research on lie detection in any direction he wanted without having to convince superiors up the chain of command of the merits in doing so.
Enter Marston, expert witness for Frye’s defense, who believed he could establish Frye’s innocence by using his lie detector to prove that Frye was being truthful when he explained why his confession was untruthful.3 Marston
2 Kenneth Weiss, Clarence Watson, and Yan Xuan, “Frye’s Backstory: A Tale of Murder, a Retracted Confession, and Scientific Hubris,” Journal of the American 2014, Volume 42 no. 2, pages 226–233: .
However, the judge was unswayed and objected to the use of an unknown and unproven tool. The case was appealed up to the DC circuit court, which agreed with the trial court judge’s skeptical view of Marston’s device and testimony. The appellate ruling included a remark on the admissibility of expert witness testimony more generally, a remark that became known as the Frye standard, asserting that expert opinion is admissible if the scientific technique on which the opinion is based is “generally accepted” as reliable in the relevant scientific community. This general acceptance standard for admissibility of scientific evidence from the Frye case is still the verbatim law in some jurisdictions today, and even in jurisdictions where it is not, the law is essentially just a more detailed and elaborate version codified4 in the so-called Federal Rule 702.
In short, while Marston hoped to make history by showing how his lie detec-tion device could prove innocence, instead he made history by forcing the court system to articulate what kinds of expert testimony should not be allowed—and his landed squarely in this disallowed category. In fact, not only did his device fail the Frye standard at the time in 1923, but the Frye standard has kept polygraph tests, even in their more modern incarnations, out of the courtroom for nearly one hundred years now. (A swing and a miss there, perhaps, but he definitely hit a home run with his other enduring creation: the comic book super heroine he proposed to DC Comics in 1940 while working as a consultant for them—a character named Wonder Woman who was equipped with the “Lasso of Truth,” a whip that forces anyone it ensnares to tell the truth. Wonder Woman drew inspiration from Marston’s wife, Elizabeth Holloway, and the idea for the Lasso arose from their influential joint research into the psychology of human emotions.)
Greek root for “to write”). In addition to blood pressure, Marston’s post-Frye polygraph measured breathing rate and sweatiness (the latter via skin conductance).
The modern polygraph is sometimes attributed to another inventor from around the same time on the opposite coast: Berkeley police officer and forensic psychologist John Augustus Larson, whose device also used a systolic blood pressure monitor and produced a continuous recording of the measurements. The details of who invented what and when are a little murky, and both these men based their inventions and ideas on earlier attempts (and, as mentioned earlier, Marston’s work was really in collaboration with his wife). Whatever the case was back then, the relevant fact is that the polygraphs we know today—well, the ones prior to the recent AI-based systems that I’ll soon discuss—are only small variants of these early 20th-century devices put forth by Marston and Larson.
6 See Footnote 5.
7 “Use of Polygraphs as ‘Lie Detectors’ by the Federal Government,” H. Rep. No. 198, 89th
It is quite possible that structural racism plays a role too by, for instance, causing Black people to be more nervous during governmental interrogations. It is rather surprising how little research has been conducted on bias in polygraphs, especially considering how widespread their use is in the public sector.
Now our history of the traditional polygraph is complete, and, with the stage properly set, we step into the world of algorithmic lie detection.
11
|
|---|
As the name suggests, EyeDetect relies not on blood pressure or skin conductivity or respiration rates like the traditional polygraph; instead, its focus is on the windows to the soul: the eyes. Perhaps an even more significant difference is that, in stark contrast to traditional polygraphs, EyeDetect does not involve a human examiner to interpret the readings and decide what is a lie and what is truthful—EyeDetect reaches its conclusions in an automated, algorithmic manner by applying machine learning. Indeed, EyeDetect was fed close-up video footage of subtle eye movements for participants who were telling the truth and also for participants who were lying, and the algorithm used this as training data to determine what honesty and dishonesty look like in the eyes.
Taking the human interpreter out of the equation certainly helps to create an air of impartiality, but does this algorithmic approach actually yield reliable and unbiased results? Sadly, no. The past several years have taught us that algorithmic bias is a serious and fundamental issue in machine learning. Not only do algorithms absorb bias that inadvertently creeps into data sets used for training, but the algorithms reproduce and often amplify this bias. Much has been written about this pernicious data-driven feedback loop phenomenon in general,13 and I’ll return to it here in the specific setting of lie detection shortly.
Converus claims its system attains an eighty-six percent accuracy rate, better than the roughly seventy percent estimated for traditional polygraphs (the company goes so far as to assert that EyeDetect is “the most accurate lie detector available”). However, the Wired investigation points out that “The only peer-reviewed academic studies of Converus’ technology have been carried out by the company’s own scientists or students in their labs,” which is not particularly reassuring and screams of an obvious conflict of interest. Eyebrows are usually raised when a private company funds research into the efficacy of a product that the company aims to profit from—but here the company didn’t just fund the research, it conducted the research itself behind closed doors. John Allen, a psychology professor not involved with Converus, was asked by Wired to read a couple of the company’s academic papers in order to try to assess the situation. This is what he had to say: “My kindest take is that there is some promise, and that perhaps with future independent research this test might provide one measure among many for formulating a hypothesis about deceptive behavior. But even that would not be definitive evidence.” Not exactly a glowing recommendation.
And these academic papers only cover the more successful experiments conducted by Converus; the company’s first field test revealed a glaring weakness in the system, yet the results of this experiment were never published. The chief scientist at Converus, who is also the cocreator of EyeDetect, later admitted what happened during this first field test: “Although the data were limited, the [test] appeared to work well when we tested well-educated people who had applied to work for an airline, but the [test] was ineffective when we tested less well-educated applicants for security companies.” This remark very much suggests that the machine learning algorithms powering EyeDetect were trained on a highly selective and nonrepresentative sample of the population, which is a common recipe in the
At an even more fundamental level, the chief scientist’s remark raises a striking question that the company seems to have left unanswered: why would one’s visual indicators of deceit depend on one’s level of education? The mythology of lie detection is that efforts to conceal deception are innate and universal, unvarying across populations—yet this failed field test shows that this is not at all the case. This observation should have rattled the very foundations of Converus’ endeavor, but instead the company seems to have just swept it under the rug and threw more data and more neural network layers at the problem. As a further indication of problematic non-universality, consider the following memo (also revealed in the Wired investigation) that a Converus marketing manager wrote to a police department client in 2016: “Please note, when an EyeDetect test is taken as a demo […] the results are often varied from what we see when examinees take the test under real test circumstances where there are consequences.” Something is very fishy here—and it gets worse.
By design, the EyeDetect system allows the examiner to adjust the sensitivity, meaning the threshold at which the algorithm declares a lie to have been detected. The idea behind this is that certain populations historically might be more truthful than others, so the system will produce more accurate results if it is calibrated to the population baseline level when examining each individual. In the words of the president and CEO of Converus, Todd Mickelsen: “This gives all examinees a fairer chance of being classified correctly. Most organizations can make good estimates of base rates by considering the number of previously failed background checks, interview data, confessions, evidence, etc.” Really?
The Wired investigation astutely points out yet another troublesome issue with EyeDetect: “Its low price and automated operation also allow it to scale up in a way that time-consuming and labor-intensive polygraph tests never could.” If it worked perfectly and had no harmful consequences, then scaling up would be great—but given the many flaws already discussed, scaling up is very dangerous. In fact, this is where pernicious data-driven feedback loops come into play, as I next explain.
It is widely recognized now that facial recognition software trained on one racial population does not perform well on other populations, and essentially all machine learning algorithms developed in the United States perform worse on Black and Brown faces than on white faces.16 I would be shocked if EyeDetect were somehow an exception to this pattern, which means EyeDetect very likely produces more false positives for Black and Brown examinees than it does for white examinees. When EyeDetect is then used en masse for employment screenings, Black and Brown people in the aggregate are unfairly kept out of the workforce. Denying these populations jobs exacerbates the already significant racial wealth gap in the United States, pushing more Black and Brown people into poverty.
Brown residents, compared to wealthy white communities.17 So in the aggregate, even if just by a small amount, by making it harder for Black and Brown people to land jobs, EyeDetect is pushing these populations into environments where they are more likely to get arrested. Now here’s the real kicker: this higher arrest rate leads to a higher failure rate for background checks (even the old-fashioned kind, because arrest records are one of the main tools used for those), which in turn boosts the “base rates” Mickelsen mentioned that are used to adjust the sensitivity in EyeDetect. The horrible irony is that this just further exacerbates the racial discrepancy in EyeDetect’s output.
19019: .
| 20 Jake Bittle, “Lie detectors have always been suspec | ||||||
|---|---|---|---|---|---|---|
| MIT | Technology | Review, | March | 13, | 2020: | |
|
||||||
|
|---|
22 See Footnote 20.
23 “Smart lie-detection s 18: .
Avatar is the commercialization of a research project undertaken by academics at the University of Arizona over the past several years. Prototypes are known26 to have been tested at an airport in Romania and a US border port in Arizona. It was also tested by the Canada Border Services Agency, but only in a laboratory setting, and the official response was less than enthusiastic: “a number of significant limitations to the experiment in question and to the technology as a whole led us to conclude that it was not a priority for field testing.” DSI says these tests yielded accuracy rates between eighty and eighty-five percent. While certainly better than random guessing, that sure leaves a lot of incorrect assessments in the field. Nonetheless, it was reported27 in August 2019 that Discern had struck a partnership with an unnamed but well-established aviation organization and was planning on marketing Avatar to airports in a matter of months. DSI’s website currently says that the “Initial markets for the application of the deception detection technology will be at airports, government institutions, mass transit hubs, and sports stadiums.”
From Video to Audio and Text
companies to help detect fraud. As always with this kind of thing, false positives are a serious issue. A Neuro-ID spokesperson clarified28 the intended use of this product: “There’s no such thing as behavior-based analysis that’s 100% accurate. What we recommend is that you use this in combination with other information about applicants to make better decisions and catch [fraudulent clients] more efficiently.”
|
|---|
the prominent data scientist Cathy O’Neil pointed out,30 the behavior of someone instructed to lie in a lab setting is very different than that of a prac-ticed liar in the real world with skin in the game. She bluntly called this a “bad study” that has “no bearing” on being able to catch a seasoned liar in the act. In a similar vein, Kate Crawford, cofounder of the AI Now Institute at New York University, noted that this experiment was detecting “perfor-mance” rather than authentic deceptive behavior.
30 Andy Greenberg, “Researchers Built an .
|
|---|
The polygraph has a long and winding history, starting with work of Marston (the creator of Wonder Woman) and his wife Holloway, and also a Berkeley police officer named Larson, that took place between 1915 and 1921. Marston convinced the government to investigate the efficacy of his invention, but the official response was skepticism rooted in common sense and historical insight. Determined to make a revolutionary impact, Marston attempted to use his device to establish the innocence of a defendant in a 1923 murder trial, but his efforts were dismissed by the court and resulted instead in the Frye standard that still stands as the law today: expert witness testimony is admissible in court only if the technology it is based on is generally accepted by the scientific community. Polygraphs did not pass that test then, and neither do the new AI-powered algorithmic variants today.
31 See Footnote 20.
C H A P T E R
| 6 |
|---|
Gravitating to
—Alex Halavais, Search Engine Society
Billions of people turn to Google to find information, but there is no guaran-tee that what you find there is accurate. As awareness of fake news has risen in recent years, so has the pressure on Google to find ways of modifying its algorithms so that trustworthy content rises to the top. Fake news is not limited to Google’s main web search platform—deceptive and harmful con-tent also play a role on other Google products such as Google Maps, Google News, and Google Images, and it also shows up on Google’s autocomplete
On the morning of November 14, 2016, six days after the US presidential election in which Trump won the electoral college and Clinton won the popular vote, both by relatively wide margins, the top link in the “In the news” section of the Google search for “final election results” was an article asserting that Trump had won the popular vote by seven hundred thousand votes.1 It was from a low-quality WordPress blog that cited Twitter posts as its source, yet somehow Google’s algorithms propelled this fake news item to the very top. In response to this worrisome blunder, a Google spokesperson said:2“The goal of Search is to provide the most relevant and useful results for our users. We clearly didn’t get it right, but we are continually working to improve our algorithms.”
The next day, Sundar Pichai—just one year into his role as CEO of Google—was asked in an interview3 with the BBC whether the virality of fake news might have influenced the outcome of the US election. Mark Zuckerberg had already dismissed this idea (naively and arrogantly, it appears in hindsight) as “pretty crazy,” whereas Pichai was more circumspect: “I am not fully sure. Look, it is important to remember this was a very close election and so, just for me, so looking at it scientifically, one in a hundred voters voting one way or the other swings the election either way.” Indeed, due to the electoral college, the election came down to just one hundred thousand votes. When asked specifically whether this tight margin means fake news could have potentially played a decisive role, Pichai said, after a pause: “Sure. You know, I think fake news as a whole could be an issue.”
Less than a year later, Eric Schmidt, then the executive chairman of Alphabet, Google’s parent company, publicly admitted4 that Google had underestimated the potential dedication and impact of weaponized disinformation campaigns from adversarial foreign powers: “We did not understand the extent to which governments—essentially what the Russians did—would use hacking to control the information space. It was not something we anticipated strongly enough.” He made that remark on August 30, 2017.
| Bloomberg, | October | 2, | 2017: | |
|
||||
Fake Business Information
In June 2019, the Wall Street Journal reported11 on the deluge of fake businesses listed on Google Maps. Experts estimated that around ten million business listings on Google Maps at any given moment are falsified and that hundreds of thousands of new ones appear each month. They claim that the “majority of listings for contractors, electricians, towing and car repair services and lawyers, among other business categories, aren’t located at their pushpins on Google Maps.” One motivation for someone to make
fake listings is to give a misleading sense of the reach of one’s business by exaggerating the number of locations and branch offices on Google Maps. Another motivation is to drown out the competition.
The owner of a cash-for-junk-cars business in the Chicago suburbs mostly relied on the Yellow Pages for advertising, but in 2018 he was contacted by a marketing firm that offered to broadcast his business on Google Maps—for a five-figure fee. He agreed, but then a few months later, the firm came back with a threat: if he doesn’t start giving them half his revenue, then they will bury his Google Maps listing under hundreds of fictitious competitors. He refused, and sure enough they posted an avalanche of made-up competitors with locations near him so that it would be very difficult for customers to find his business amid all the fake noise. He drove around a few Chicago neighborhoods and searched on his phone for auto salvage yards as he went; he said that more than half the results that came up were fake. These fake listings pushed his business listing off the first page of Google Maps search results, and soon his number of incoming calls dropped by fifty percent.
blog post12 written by the director of Google Maps titled “How we fight fake business profiles on Google Maps.” This blog post was published on June 20, 2019—the same exact date as the Wall Street Journal piece. It does not take a great stretch of the imagination to see this conspicuously timed blog post as a strategic effort to reduce the backlash that would surely follow the publication of the Wall Street Journal investigation. This post includes some other staggering figures, including that Google Maps has over two hundred million places and that “every month we connect people to businesses more than nine billion times, including more than one billion phone calls and three billion requests for directions.”
13
Some of the images of Black women that came up on this particular search were from blog posts and Pinterest boards by Black women discussing racist attitudes about hair in the workplace. For instance, one top image was from a post criticizing a university’s ban on dreadlocks and cornrows; the post illustrated the banned hairstyles by showing pictures of Black women with them and lamented how these hairstyles were deemed unprofessional by the university. The ban was clearly racist, whereas the post calling attention to it was the opposite, it was antiracist. The Google image search conflated these two contrasting aspects and stripped the hairstyle image of its context, simply associating the image with the word “unprofessional.” In doing so, it turned an antiracist image into a racist one. In this inadvertent manner, racism on one
14 Leigh Alexander, “
Just a few months after this Google hairstyle fiasco, a trio of researchers in Brazil presented a detailed study on another manifestation of racism in Google’s image search—one so abhorrent that the academic study was promptly covered prominently by the Washington Post.15 The researchers collected the top fifty results from Google image searches for “beautiful woman” and “ugly woman,” and they did this for searches based in dozens of different countries to see how the results vary by region. This yielded over two thousand images that were then fed into a commercial AI system for estimating the age, race, and gender of each person (supposedly with ninety percent accuracy). Here’s what they found.
In almost every country the researchers analyzed, white women appeared more in the search results for “beautiful,” and Black and Brown women appeared more in the results for “ugly”—even in Nigeria, Angola, and Brazil, where Black and Brown populations are predominate. In the United States, the results for “beautiful” were eighty percent white and mostly in the age range of nineteen to twenty-eight, whereas the results for “ugly” dropped to sixty percent white—and rose to thirty percent Black—and the ages mostly ranged from thirty to fifty, according to the AI estimates. This form of racism and ageism was not invented by Google’s algorithm, it originates in society itself—but the algorithm picks up on it and then harmfully presents it to the world as an established fact. Thankfully, Google seems to have found ways of improving its algorithm in this regard as image searches for beauty now yield a much more diverse range of individuals.
The problem here is that even the best AI algorithms today don’t form abstract conceptualizations or common sense the way a human brain does—they just find patterns when hoovering up reams of data. (You might object that when I discussed deep learning earlier in this book, I did say that it is able to form abstract conceptualizations, but that’s more in the sense of patterns within patterns rather than the kind of anthropomorphic conceptualizations us humans are used to.) If Google’s algorithms are trained on real-world data that contains real-world racism, such as Black people being referred to as gorillas, then the algorithms will learn and reproduce this same form of racism.
Let me quickly recap the very public racist Google incidents discussed so far to emphasize the timeline. In May 2015, the Washington Post reported the Google Maps White House story that the office of the first Black president in
the history of the United States was labeled with the most offensive racial slur in existence. One week later, Google launched Google Photos and within a month had to apologize for tagging images of Black people as gorillas, a story covered by the Wall Street Journal, among others. Less than a year later, in April 2016, the Guardian reported that Google image searches for unprofessional hairstyles mostly showed photos of Black women. Just a few months after that, in August 2016, the Washington Post covered a research investigation that showed Google image search results correlated beauty with race. Oh, I almost forgot: two months earlier, in June 2016, it was reported in many news outlets, including BBC News,18 that doing a Google image search for “three black teenagers” returned mostly police mugshots, whereas searching for “three white teenagers” just showed smiling groups of wholesome-looking kids. These documented racist incidents are just a sample of the dangers inherent in letting Google’s data-hungry machine learning algorithms sort and share the world’s library of photographs.
Google Autocomplete
19 Carole Cadw
20 Guardian, December 5, 2016:
A Google spokesperson said the company took action within hours of being notified of the offensive autocompletes uncovered by the Guardian article.
21 Danny
An investigation24 was published in Wired just a few days after this UK hearing, finding that “almost a year after removing the ‘are jews evil?’ prompt, Google search still drags up a range of awful autocomplete suggestions for queries related to gender, race, religion, and Adolf Hitler.” To avoid possibly misleading results, the searches for this Wired article were conducted in “incognito” mode, meaning Google’s algorithm was only using general search history data rather than user-specific data. The top autocompletes for the prompt “Islamists are” were, in order of appearance, “not our friends,” “coming,” “evil,” “nuts,” “stupid,” and “terrorists.” The prompt “Hitler is” yielded several reasonable autocompletes as well as two cringeworthy ones: “my hero” and “god.” The first autocomplete for “white supremacy is” was “good,” whereas “black lives matter is” elicited the autocomplete “a hate group.” Fortunately, at least, the top link for the search “black lives matter is a hate group” was to a Southern Poverty Law Center post explaining why BLM is not, in fact, a hate group. Sadly, however, one of the top links for the search “Hitler is my hero” was a headline proclaiming “10 Reasons Why Hitler Was One of the Good Guys.”
Strikingly, the prompt “blacks are” had only one autocomplete, which was “not oppressed,” and the prompt “feminists are” also only had a single autocomplete: “sexist.” Google had clearly removed most of the autocompletes for these prompts but missed these ones which are still biased and a potentially harmful direction to send unwitting users toward. Some things did improve in the year between the original Guardian story and the Wired follow-up. For instance, the prompt “did the hol” earlier autocompleted to “did the Holocaust happen,” and then the top link for this completed search was to the neo-Nazi propaganda/fake news website Daily Stormer, whereas afterward this autocomplete disappeared, and even if a user typed the full search phrase manually, the top search result was, reassuringly, the Holocaust Museum’s page on combatting Holocaust denial.
‘We fixed this one thing. We can move on now’.” It is important to remember that when it comes to hate, prejudice, and disinformation, Google—like many of the other tech giants—is up against a monumental and mercurial challenge.
One week after the Wired article, it was noted25 that the top autocomplete for “white culture is” was “superior”; the top autocompletes for “black culture is” were “toxic,” “bad,” and “taking over America.” Recall that in 2014, Ferguson, Missouri, was the site of a large protest movement responding to the fatal police shooting of an eighteen-year-old Black man named Michael Brown. In February 2018, the autocompletes for “Ferguson was” were, in order: “a lie,” “staged,” “not about race,” “a thug,” “planned,” “he armed,” “a hoax,” “fake,” “stupid,” and “not racist”; the top autocompletes for “Michael Brown was” were “a thug,” “no angel,” and “a criminal.”
25 Barry Schwartz, “Google Defends False, Offensive & Fake Searc 26ying to separate the topic of hate speech from the topic of fake news in this treatment of Google autocompletes. The dividing line, however, is admittedly quite blurred.
|
|---|
27 Hannah Roberts, “How Google’s ‘autocomplete’ the web,” Business Insider, December 5, 2016:
28.
Autocomplete is somehow confirming. In these cases, our systems identify if there’s likely to be reliable content on a particular topic for a particular search. If that likelihood is low, the systems might automatically prevent a prediction from appearing.” You have to read that statement carefully: Google is not saying that it removes misinformative autocompletes, it is saying that it removes some autocompletes that would yield mostly fake news search results. I suppose the idea behind this is that if a user sees a false assertion as an autocomplete, the truth should be revealed when the user proceeds to search for that assertion—and only if the assertion is not readily debunkable this way should it be removed from autocomplete. But, to me at least, that doesn’t really jibe with the first sentence in Google’s statement, which makes it seem that the company is concerned about people seeing misinformative autocompletes regardless of the searches they lead to.
Do you remember Guillaume Chaslot, the former Google computer engineer you met in Chapter 4 who went from working on YouTube’s recommendation on the inside to exposing the algorithm’s ills from the outside? On November 3, 2020—election day—he found that the top autocompletes for “civil war is” were, in order: “coming,” “here,” “inevitable,” “upon us,” “what,” “coming to the us,” and “here 2020.” On January 6, 2021—the day of the Capitol building insurrection—he tried the same phrase and found the top autocompletes were no less terrifying: “coming,” “an example of which literary term,” “inevitable,” “here,” “imminent,” “upon us.”
“coronavirus is” included “not that serious,” “ending,” “the common cold,” “not airborne,” and “over now.” In fact, of the ten autocompletes for this search phrase, six were assertions that have been proven wrong. And once again the order of these autocompletes was unrelated to search popularity as measured by Google Trends. Chaslot found climate change denial/ misinformation persisted as well: three of the top five autocompletes for “global warming is” were “not caused by humans,” “good,” and “natural.” The phrase “global warming is bad” was searched three times as often as “global warming is good,” and yet the latter was the number four autocomplete, while the former was not included as an autocomplete.
33ay 6, 2019: .
|
|---|
34.
|
|---|
The researchers go on to boldly suggest that “Google’s search algorithm, propelled by user activity, has been determining the outcomes of close elections worldwide for years, with increasing impact every year because of increasing Internet penetration.” I find this assertion to be a stretch—at least, the evidence to really back it up isn’t in their PNAS paper—but my focus in this book is not political bias and elections, it is fake news. And the researchers here did convincingly establish that search rankings matter and affect people’s views, which means there are real consequences when Google places fake news links highly in its search rankings.
35 Robert Epstein and Ronald Robertson, “The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections,” Proceedings ust 18, 2015, 112 (33), E4512-E4521: .
Signals the Algorithm Uses
What factors determine how highly ranked pages are in Google searches? Once again, Google won’t reveal much about its algorithmic trade secrets—in part to prevent competitors from copying them, but also to prevent people from gaming the algorithm—so we only know the broadest outlines. The official company website describing the search algorithm38 states the following: “Search algorithms look at many factors, including the words of your query, relevance and usability of pages, expertise of sources, and your location and settings. The weight applied to each factor varies depending on the nature of your query—for example, the freshness of the content plays a bigger role in answering queries about current news topics than it does about dictionary definitions.” These factors are largely about finding links that appear to be good matches to the search query. When it comes to ranking the results, Google says the algorithm attempts to “prioritize the most reliable sources available” by considering factors that “help determine which pages demonstrate expertise, authoritativeness, and trustworthiness on a given topic.” This sounds good, but it’s quite vague. The two examples Google gives are that a
site is bumped up in the rankings if other prominent sites link to it (this is the essence of the original PageRank algorithm Google first launched with in 1998) or if many users visit the site after doing closely related searches.
Earlier in the history of the algorithm, the PageRank method played a more prominent role, and less attention was given to assessing the quality of information by other means. Nefarious actors figured out how to use this narrow focus on link counting to manipulate the rankings. In December 2016, it was reported39 that fake news and right-wing extremist sites “created a vast network of links to each other and mainstream sites that has enabled them to game Google’s algorithm.” This led to harmful bigotry and disinformation—for instance, eight of the top ten search results for “was Hitler bad?” were to Holocaust denial sites.
41ose, “G
|
|---|
Indeed, in March 2017, it was found43 that asking Google which US presidents were in the KKK resulted in a snippet falsely claiming that several were; asking “Is Obama planning a coup?” yielded a snippet that said “According to details exposed in Western Center for Journalism’s exclusive video, not only could Obama be in bed with the Communist Chinese, but Obama may in fact be planning a Communist coup d’état at the end of his term in 2016!”; searching for a gun control measure called “Proposition 63” yielded a snippet falsely describing it as “a deceptive ballot initiative that will criminalize millions of law abiding Californians.” In the case of the Obama coup snippet, the top search result was an article debunking this fake news story about an upcoming coup attempt, but when using Google’s Home Assistant, there are no search results listed—all one gets is the featured snippet read aloud.
Some fake news publishers scrape articles from the Web and repost them as their own in an effort to give the stories a wider platform and to collect ad revenue in the process. While most fake news on the Web doesn’t violate any Google policies that would prevent it from being eligible to show up on search results, if an article is found to be in violation of copyright, then Google will expunge it from search listings—so these spammy fake news sites indeed get delisted.
But an extensive Wall Street Journal investigation45 found an unsettling twist here: people have been gaming Google’s copyright infringement request system in order to delist content that is unflattering or financially impactful to certain parties. One of the techniques used is backdating: someone copies a published article and posts it on their blog but with a misleading time stamp to make it appear that it predates the published article; then they tell Google that the published article is violating their blog’s copyright, and the published article is removed from Google search results. When this happens, Google is delisting an actual news article on the basis of a false copyright infringement notification. Daphne Keller, a former Google lawyer and currently a program director at Stanford University’s Cyber Policy Center, said that “If people can manipulate the gatekeepers to make important and lawful information disappear, that’s a big deal.” The Wall Street Journal found that not only can people indeed do this, but they have been doing so in surprisingly large numbers.
One of the clusters of fraudulent requests the Wall Street Journal found concerned Russian-language news articles critical of politicians and business leaders in Ukraine. These articles were taken off Google after various organizations including a Russian edition of Newsweek filed a copyright violation request—but it turned out these organizations were all fake, the supposed Russian Newsweek had nothing to do with Newsweek, it was just using Newsweek’s logo to deceive Google into thinking the copyright violation notification was legitimate.
There is a secondary harm to these deceptive methods for tricking Google into delisting real news articles: in Google’s recent efforts to elevate quality journalism, one factor the ranking algorithm considers is the number of copyright violations sites have received. A Google spokesperson said that “If a website receives a large number of valid takedown notices, the site might appear lower overall in search results.” But the Wall Street Journal investigation established that many of the takedowns that Google thinks are valid are actually invalid and the result of deliberate disinformation aimed at Google’s automated system. This opens the door for more gaming of Google’s rankings by dishonest actors.
While text generation is immensely useful, it turns out that for many applications one needs something that is essentially a by-product of the inner workings of a deep neural net that occurs automatically while training for tasks like text generation: a vector representation of words (sometimes called a word embedding). This means a way of representing each word as a vector—that is, a list of numerical coordinates—in such a way that the geometry of the distribution of word vectors reflects important semantic and syntactic information. Roughly speaking, we want words that frequently appear in close proximity to each other in written text to have vector representations that are geometrically in close proximity to each other. Vector embeddings translate data in messy formats like text into the standard numerical formats that machine learning algorithms know and love.
Earlier word embeddings (one of the most popular, called Word2vec, was developed by Google in 2013) produced a fixed, static vector for each word. This opened the door to many breakthroughs: for example, analyzing the sentiment of words and sentences (how positive or negative they are) turns into a more familiar geometric analysis of vectors in Euclidean space, where, for instance, one looks for a plane that separates positive word vectors from negative word vectors. One of the key drawbacks in these early static approaches was that a single word can have multiple meanings (such as “stick” a landing, “stick” from a tree, and “stick” to a plan), and all the different meanings got conflated when the word was represented as a vector. In contrast, ELMo and BERT are contextual word embeddings, which means the vector representations are not fixed and static—they depend on the surrounding text. If you feed these systems the sentence “I hope the gymnast sticks the landing” and the sentence “the toddler sticks out her tongue,” the word “sticks” will have different vector representations in each case. This allows for much more flexibility in language modeling and understanding.
This Google blog post went on to say that “when it comes to ranking results, BERT will help Search better understand one in 10 searches in the U.S. in English” and that “Particularly for longer, more conversational queries, or searches where prepositions like ‘for’ and ‘to’ matter a lot to the meaning, Search will be able to understand the context of the words in your query.” To illustrate the types of improvements users should expect, the blog post included an example of a user searching “Can you get medicine for someone pharmacy.” Previously, the top search result was an article that included each of these individual words, but it didn’t answer the question the user was attempting to ask; with the new BERT-powered search, the top result was an article specifically addressing when and how people can pick up medications for others at the pharmacy.
Google evidently underestimated itself with the “one in 10” figure, because almost exactly one year later in another company blog post47 Google declared
that “BERT is now used in almost every query in English.” And in September 2020, Google announced48 that BERT was also being used “to improve the matching between news stories and available fact checks.” In Chapter 9, I’ll cover fact-checking tools in depth; for now, what’s relevant here is that BERT is used to automatically scan through lists of human fact-check reports and figure out which ones pertain to a given news article.
But BERT also featured prominently in a recent debacle that landed Google in the news in an unflattering light. Stanford-trained computer scientist Timnit Gebru is one of the world’s leading experts on ethics in AI and algorithmic bias; she is a cofounder of the organization Black in AI; and, until recently, she was one of the leaders of Google’s Ethical Artificial Intelligence Team. But she was abruptly fired from Google in December 2020.49 She was working on a research paper with several other Google employees when a request from the higher-ups came in asking her to either withdraw the paper or remove the names of all Google employees from it. She refused and demanded to know who was responsible for this bizarre request and their reasoning behind it, but Google leadership rebuffed her demand and instead fired her.
|
|---|
Elevating Quality Journalism
The Government must make sure that online platforms bear ultimate responsibility for the content that their algorithms promote. […] Transparency of online platforms is essential if democracy is to flourish. Platforms like Facebook and Google seek to hide behind ‘black box’ algorithms which choose what content users are shown. They take the position that their decisions are not responsible for harms that may result from online activity. This is plain wrong. The decisions platforms make in designing and training these algorithmic systems shape the conversations that happen online.
50 Kristie Ca
The amount of information on the world wide web is extraordinarily large. There are billions of pages. We have no ability to manually evaluate all that content, but we have about 10,000 people, as part of our Google family, who evaluate websites. We have perhaps as many as nine opinions of selected pages. In the case of search, we have a 168-page document given over to how you determine the quality of a website. […] Once we have samples of webpages that have been evaluated by those evaluators, we can take what they have done and the webpages their evaluations apply to, and make a machine-learning neural network that reflects the quality they have been able to assert for the webpages. Those webpages become the training set for a machine-learning system. The machine-learning system is then applied to all the webpages we index in the world wide web. Once that application has been done, we use that information and other indicators to rank-order the responses that come back from a web search.
He summarized this as follows: “There is a two-step process. There is a manual process to establish criteria and a good-quality training set, and then a machine-learning system to scale up to the size of the world wide web, which we index.” Many of Google’s blog posts and official statements concerning the company’s efforts to elevate quality journalism come back to this team of ten thousand human evaluators, so to dig deeper into Cerf’s dense statement here, it would be helpful to better understand what these people do and how their work impacts the algorithm. Fortunately, an inside look at the job of the Google evaluator was provided in a Wall Street Journal investigation52 from November 2019.
While some of the notable updates to the Google search algorithm have been publicly announced53 by the company (several were mentioned in this chapter), Google actually tweaks its algorithm extremely often. In fact, the same Wall Street Journal investigation just mentioned also found that Google modified its algorithm over thirty-two hundred times in 2018. And the number of algorithm adjustments has been increasing rapidly: in 2017, there were around twenty-four hundred, and back in 2010 there were only around five hundred. Google
53 These announcements typically appear in Google blog posts, but a convenient list and description of the substantial on ine Journal: .
Your reaction to this is, “WTF? How could that possibly happen?” The answer is that these systems do not recognise things in the same way we do. We abstract from images. We recognise cats as having little triangular ears, fur and a tail, and we are pretty sure that fire engines do not. But the mechanical system of recognition in machine-learning systems does not work in the same way our brains do. We know they can be brittle, and you just cited a very good example of that kind of brittleness. We are working to remove those problems or identify where they could occur, but it is still an area of significant research. To your primary question, are we conscious of the sensitivity and the potential failure modes? Yes. Do we know how to prevent all those failure modes? No, not yet.
In short, we trust Google’s algorithms to provide society with the answers to all its questions—even though it sometimes fans the flames of hate and fake news and we don’t entirely know how to stop it from doing so.
Fake news and harmful misinformation appearing at or near the top of Google search results became a widely discussed topic after Trump unexpectedly won the 2016 election. Many people started to blame Google and the other tech giants for their role in the election and in eroding the very notion of truth. Google responded by making a series of adjustments to its ranking algorithm—often with the assistance of an army of low-paid contract workers—over the ensuing years aimed at bringing trustworthy links to the top of searches and pushing less reliable ones lower down. In this chapter, I presented a variety of examples where this played out, gathered what technical details I could about the closely guarded search algorithm, and looked into the public statements and general strategies Google has employed in this effort to elevate quality journalism. I also discussed instances of misinformation, deception, hateful stereotyping, and blatant racism that surfaced on other corners of Google such as maps, image search, and autocomplete.
In the next chapter, I tackle another aspect of Google: how its advertising platform provides the revenue stream for a huge fraction of the fake news industry. Facebook is also brought into the fray, though its advertising platform fans the flames of fake news in a rather different way, as you will soon see.
News and Reinforces Racism
One of the incentives for a good portion of fake news is money.
The second half of the chapter turns attention to Facebook. Here, the issue is not that the company is funding fake news, the issue is that Facebook profits from fake news in the form of political advertisements and in the process exposes a massive audience to fake news. The chapter also takes a deep dive into Facebook’s algorithmically powered ad distribution system and details multiple dimensions of racism and discrimination that the system engages in. Here, too, the sequence of reluctant steps the company has taken to mitigate these problems is discussed.
Google Ads and Fake News
3ate, “In A First, Google Ad Revenue E Digital Ad Market,” Forbes, June 22, 2020:
|
|---|
4 John Koetsier, “30 billion times
After the 2016 US election, there were some complaints from the public and from a handful of major advertisers over Google’s role in supporting hateful and factually misleading sites with ad revenue. Google responded6 with a statement that included the following assertion: “Moving forward, we will restrict ad serving on pages that misrepresent, misstate, or conceal information
| 5 Daniel Stevens, “Report: Google Mak | ||||
|---|---|---|---|---|
|
||||
Complaints about harmful ad placements continued, and in March 2017 Google tried to step up its game:8 “Starting today, we’re taking a tougher stance on hateful, offensive and derogatory content. This includes removing ads more effectively from content that is attacking or harassing people based on their race, religion, gender or similar categories.” But the CfA report found that content of this type remained in the Google Display Network. For instance, Breitbart was included in the network. Many advertisers explicitly pulled their ads from Breitbart once they realized their ads were being placed there, but this had to happen on an ad hoc basis since Google’s dashboard lacked the refined controls for preventing such ad placement in the first place. As the CfA report describes: “Under Google’s system, it is incumbent upon advertisers to identify and blacklist specific domains that they find objectionable. But Google doesn’t make this easy: its ad platforms don’t allow advertisers to block fake news sites as a category. […] Even if advertisers could identify specific extreme websites, Google offers these publishers a way to circumvent advertiser exclusions by making their sites anonymous.”
7
2019 Report
In September 2019, a UK nonprofit called the Global Disinformation Index (GDI) released a report9 analyzing ad placement on fake news sites. The researchers behind this report started by collecting a list of seventeen hundred websites that had been flagged by fact-checking organizations such as PolitiFact for publishing content that included fake news. They found that Google was serving up ads on a whopping seventy percent of these dubious sites; the second largest contributor was AppNexus, with ads on eight percent of the sites; coming in at third place was Amazon, with four percent.
for almost forty percent of the fake news industry’s revenue10 that year; curiously, this is almost exactly the same share of ad revenue that Google was responsible for among well-respected, factual news sites. In other words, Google had equally dominant shares of the advertising market on both real and fake news sites.
This bears repeating: fake news is a big business, worth nearly a quarter billion dollars in 2019 in online ad revenue—at least according to the GDI estimates—and Google was responsible for a larger share of this revenue than any other advertising company. If Google really has been attempting to stanch the flow of funds to fake news sites since the 2016 election, evidently by 2019 there was still quite a lot of room for improvement in this regard.
11 Matt Skibinski, “Sp
|
|---|
12 Issie Lapowsky, “Google says it’s fighting elect , 2021: .
13op philanthropists 020: .
In the conclusion of the paper, Sweeney ponders the causes of this unseemly algorithmic advertising behavior: “Why is this discrimination occurring? Is this Instant Checkmate, Google, or society’s fault? Answering […] requires further information about the inner workings of Google AdSense.” She goes on to explain that Google allows advertisers to provide multiple different ads for the same keyword search, and Google displays these in a probabilistic manner: initially, they are given equal odds of appearing on the search, but as people click the different versions at different rates, Google’s algorithm updates the probabilities so that the more popular ones are then shown more frequently. Instant Checkmate said that the different messages in its ads were grouped by last name, not first name, so the fact that its arrest records ads were appearing more frequently for typically Black names than white means that people were clicking the arrest record ads for Black first names more often than for white first names. Thus, the racism here originated with society at large—but it was enabled and reinforced by Google’s algorithm.
More recently, a July 2020 investigation15 by two investigative reporters for the technology-oriented news site the Markup looked into Google’s Keyword Planner, which suggests terms for advertisers to associate with their ads so that the ads show up on relevant Google searches. They found that the majority of the keywords suggested for the phrases “Black girls,” “Latina girls,” and “Asian girls” were pornographic, and so were the suggestions for boys of these ethnicities, whereas for “white girls” and “white boys,” no keywords were returned at all. The investigators insightfully summarized the situation as follows: “Google’s systems contained a racial bias that equated people of color with objectified sexualization while exempting White people from any associations whatsoever. […] By not offering a significant number of non-
An even more recent investigation by the Markup uncovered racist aspects of Google’s placement of ads on YouTube (which, as you recall, is owned by Google). In June 2020, just weeks after the police murder of George Floyd, the CEO of YouTube wrote16 that “We’re committed to doing better as a platform to center and amplify Black voices and perspectives. […] At YouTube, we believe Black lives matter and we all need to do more to dismantle systemic racism.” But ten months later, in April 2021, the Markup found17 that Google was blocking “Black Lives Matter” as a search phrase for advertisers to find videos and channels to place ads on. This makes it harder for people in the movement to monetize their videos, thereby denying them a valuable revenue source, and it also prevents people who want to place ads supporting the movement from being able to reach an appropriate audience.
When a potential advertiser did a search on the Google Ads platform for the phrase “White Lives Matter” (which the Southern Poverty Law Center describes as a “racist response to the civil rights movements Black Lives Matter”), over thirty million YouTube videos were returned as possible places to place an ad; in contrast, searching “Black Lives Matter” returned zero videos. Over one hundred million videos were returned for the search “White Power,” whereas zero videos were returned for “Black Power.” After the Markup journalists contacted Google with these findings, Google blocked phrases associated with white supremacy like “White Lives Matter” and “White Power” from the ad search, but it did not unblock the corresponding Black phrases—even though it is widely considered that the white phrases are part of hate movements, while the Black phrases are part of legitimate social justice and antiracist movements. Strangely, the Markup found that Google even started to block more search phrases like “Black in Tech” and “antiracism.” The only conceivable excuse I can imagine for blocking phrases like these and “Black Lives Matter” is that, as the Markup reporters point out, it does make it harder for critics to monetize their anti-BLM videos—but to me at least, that doesn’t justify the unequal treatment caused by these rather surreptitious algorithmic adjustments.
And back in September 2017, a BuzzFeed News investigation18 found that when advertisers tried to target audiences with certain bigoted phrases, Google not only allowed it but automatically suggested further offensive search phrases the advertiser should consider using. For instance, when the advertiser used “Why do Jews ruin everything” as the search phrase for targeting an ad, Google suggested also using “the evil Jew” and “Jewish control of banks.” These suggestions did not come from the minds of humans at Google—they were based on statistical associations that Google’s data-hungry algorithms absorbed from the massive amounts of search data that the company collects. BuzzFeed tested the system by placing an ad with these targeted search phrases, and indeed the ad went live and came up when someone searched Google for any of these phrases.
Facebook Ads and Racism
Offensive Ad Categories
In September 2017, journalists at ProPublica found19 a Facebook ad category called “Jew hater” and tried to place a sponsored post in it. Facebook’s system responded that the category was too small, it only accepts ads whose target audience is above a minimum cutoff size, so Facebook offered an algorithmically generated suggestion for a second category to jointly target: “Second Amendment.” Evidently, Facebook’s algorithm had, rather frighteningly, correlated anti-Semites with gun enthusiasts. The journalists decided instead to search for more anti-Semitic categories just to see what was available. And indeed they found others, such as “How to burn jews” and “History of ‘why jews ruin the world’.” The journalists noted that these would be excellent category choices if you wanted to, say, market Nazi memorabilia or recruit marchers for a far-right rally.
19 Julia Angwin, Madeleine Varner, and Ariana Tobin, “F
20
Algorithms Could Help Instead of Hurt
For context, and to show that algorithms are capable of both good and bad, it helps to compare Facebook’s algorithmically generated racist ad categories—both the targeted groups and the excluded groups—with an example where algorithms are used to help human moderators reduce bias in advertising.
Fast-forward to November 2017. This is just over a year since ProPublica brought attention to Facebook’s option of illegally excluding various ethnic demographics from advertisements, and two months after ProPublica brought attention to Facebook providing advertisers with the ability to target offensive and hateful interest groups. It is also nine months after Facebook announced22 that it had taken several steps to strengthen the procedures it uses to prevent discriminatory advertising, especially in the areas of housing, employment, and credit—the three areas in which federal law prohibits discriminatory ads. The Washington Post headline reporting this company announcement read “Facebook cracks down on ads that discriminate.” But ProPublica conducted another investigation at this time and found that Facebook’s supposed improvements fell far short of the bold proclamation.
Indeed, ProPublica’s November 2017 investigation23 showed that it was still possible for ad purchasers to select excluded audience categories such as “African Americans, mothers of high school kids, people interested in wheelchair ramps, Jews, expats from Argentina, and Spanish speakers.” All of these groups are protected under the federal Fair Housing Act, so Facebook was still quite clearly in violation of federal law. Facebook’s response this time? “This was a failure in our enforcement and we’re disappointed that we fell short of our commitments. The […] ads purchased by ProPublica should have but did not trigger the extra review and certifications we put in place due to a technical failure.” A remarkably vague excuse/reassurance.
Legal Action
25 Twitter’s and Google’s ad as part of housing discrimination probe,” Washington Post, March 28, 2019:
|
|---|
26 Muhammad Ali et al., “Discrimination through Optimization: How Facebook’s Ad Delivery Can Lead to Biased Outcomes,” Proceedings of the ACM on Human (November 2019). ACM, New York, NY: .
|
|---|
In July 2020, the Wall Street Journal reported28 that Facebook was “creating new teams dedicated to studying and addressing potential racial bias on its core platform and Instagram unit, in a departure from the company’s prior reluctance to explore the way its products affect different minority groups.” This followed—and likely resulted from—sustained pressure from civil rights groups (including the Anti-Defamation League, Color of Change, and the NAACP), increasingly vocal employee unrest, and a months-long advertising boycott (including large, prominent clients such as Coca-Cola, Disney, McDonald’s, Starbucks, and Walmart) that cost Facebook advertising revenue and global reputation. These new equity and inclusion teams at Facebook aim to study how all of the company’s products and algorithms—not just its targeted advertising platform—impact minority users and how the user experience for Black, Hispanic, and other ethnic groups differs from that of white users. Let’s hope these teams are given the resources and respect they need to make a positive impact at the company.
27 28 After Previously Limiting Such Efforts,” Wall Street Journal, July 21, 2020:
| • |
|---|
29 Kim et al., “The Stealth Media? Groups and Targets behind nication Vol. 35 Issue 4, July 2018: .
30Cooke, “Google, news sites,” Reuters, November 14, 2016:
| • | ||
|---|---|---|
|
||
|
||
| • | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
| • |
|
||||||
|---|---|---|---|---|---|---|---|
|
|||||||
| • | |||||||
| other | companies | are | subject | to | |||
34 Elizabeth Culliford, “Warren campaign ch Zuckerberg ad,” Reuters, October 12, 2019:
35k newsroom, January 9, 2020: .
| • | ||
|---|---|---|
|
||
|
||
| • | ||
|
||
|
||
|
||
|
||
37 020: .
38019: .
Google is the largest advertising company in the world, and it serves ads in two ways: by placing them on its own site—for instance, to appear when users do keyword searches—and by placing them on external sites in the so-called Google Display Network. Many sites in this network have a proven track record of publishing fake news, and yet they remain in the network. The result is that Google is funneling huge sums of money to fake news
39
| 8 |
|---|
Social Spread
Moderating Misinformation on Facebook and
1 Stephen Nellis, “Apple’s Tim Cook cri conflict,” Reuters, January 28, 2021:
© Noah Giansiracusa 2021
event Fake News,
Those Who Rely Primarily on Social Media
Pew surveys conducted between October 2019 and June 2020 found2 that nearly one in five US adults said they turn most to social media for political and election news—and among those under thirty years old, this figure was nearly one in two. Fewer than one in ten of those who relied primarily on social media said they were following news about the 2020 election very closely, whereas for those who relied primarily on cable TV or print news, around one in three said they were following it closely. The proportion of people who said they were following the coronavirus outbreak closely was twice as high among people who got their news primarily from cable TV or national network TV or news websites and apps as it was for people who relied primarily on social media.
the percentage of people who said they were concerned about the impact of misinformation on the 2020 election was lower for the social media group than it was for all others except for people who relied primarily on local news. Being exposed to conspiracy theories is not the same as believing them—to some extent, the fact that the social media group saw more misinformation but was less concerned by it might mean that this group was more adept at telling fact from fiction. But the lower performance on the knowledge quiz undercuts this sanguine interpretation.
These both may at first seem like innocuous steps, but it is important to recognize that any time an algorithm decides what content to show you, there is a risk that misinformation will get algorithmically amplified beyond the confines in which it would normally prosper organically. Social media algorithms are typically designed to maximize various user engagement metrics, so if a harmful conspiracy theory—such as COVID-19 being a government-developed biological weapon or the 2020 election being stolen—generates a lot of engagement, then the algorithms pick up on this and display the misinformation more prominently and broadcast it to a wider audience. Needless to say, this applies not just to Twitter but also to Facebook’s newsfeed and to any other ranking and recommendation algorithms in social media.
|
|---|
|
|---|
5 Stuart Thompson and Charlie Warzel, “They Used to Post Selfie 2021: .
The Wall Street Journal reported7 that from the morning of January 6, 2021 (the day of the Capitol building insurrection) to the afternoon, Facebook’s internal team of data scientists noted a tenfold increase in user-reported violent content on the platform; user reports of fake news surged to forty
6 Laura Edelson et
In the days after the Capitol building insurrection, it was reported8 that Facebook was showing ads for weapons accessories and body armor in “patriot” and militia-themed Facebook groups alongside pro-Trump disinformation about the election being rigged and stolen. Two days later, three US Senators co-authored a public letter to Facebook founder and CEO Mark Zuckerberg urging him to take immediate action to halt these ads that they described as “designed to equip white nationalists, neo-Nazis and other domestic extremist organizations.” The next day, Facebook acquiesced and announced it was immediately halting all such ads for a week, only allowing them to return after the upcoming inauguration. But the day after this announcement, it was found that many of these dangerous ads had not been taken down. A few weeks later, Facebook began experimentally reducing the amount of political content for a sample of users9 in an effort to “turn down the temperature and discourage divisive conversations and communities,” with the aid of a machine learning algorithm trained to identify political content.
Flashback to November 19, 2016, less than two weeks after Donald Trump’s surprising election victory. Mark Zuckerberg posts10 a message on his Facebook account that begins: “A lot of you have asked what we’re doing about misinformation, so I wanted to give an update.” After saying that “we know people want accurate information,” he goes on to admit that “The problems here are complex, both technically and philosophically” and that Facebook is taking an indirect approach: “We do not want to be arbiters of
truth ourselves, but instead rely on our community and trusted third parties.” He lists several projects underway in the fight against misinformation, the first and “most important” being stronger detection: “This means better technical systems to detect what people will flag as false before they do it themselves.” Zuckerberg is implying here that out of all the approaches to limit the spread of misinformation on his platform, the primary one is developing predictive machine learning algorithms.
A curious subtlety to note here is that what he has in mind is not the supervised learning task of classifying posts as true or false, but instead a somewhat less epistemological classification into posts likely to be flagged by users versus posts unlikely to be flagged. This nuanced distinction matters: whether or not a post in a private group gets flagged as misleading depends a lot on what the focus of the group is. Regardless, the main takeaway here is that in the wake of the 2016 election, Zuckerberg was advocating a technical—and specifically, machine learning—approach to moderating his platform. In numerous public statements before and after this, Zuckerberg has maintained this philosophy that AI will be the company’s panacea when it comes to thorny societal problems like the spread of harmful misinformation. But a lot transpired between that presidential election and the next, and one of the main goals of this chapter is to discuss what algorithmic approaches Facebook actually tried and how well they worked and what other algorithmic methods might be possible.
What was the result of this experiment? People in the group where this comment promotion method was implemented were for the most part angry, confused, less able to tell what was fake versus real, and less confident in Facebook’s ability to stem the flow of misinformation. Why? Because suddenly the first thing these users saw under nearly every article about politics and politicized topics like the economy and climate change from the most reputable news organizations like BBC News, the New York Times, the Economist, etc. were comments proclaiming the story to be fake. Rather than helping people spot actual fake news, this made real news look fake and left readers in a world where nothing, and hence everything, was believable. Jen Roberts, a freelance PR consultant, captured it well when she said11 that “to question the veracity of every single story is preposterous” because this “blurs the lines between what is real and what isn’t” and turns Facebook newsfeeds into “some awful Orwellian doublethink experiment.” Suffice it to say, this was one A/B test that when the experiment concluded Facebook decided to trash the modification and go back to the way things were, imperfect as they were.
Sophisticated algorithms play a role in the spread of fake news on Facebook not just through the ranking of newsfeed content and recommendations for groups to join and pages to follow (the main topics of this chapter), and in political advertising (the topic of the previous chapter), but also in Facebook’s search and autocomplete features where the problems are similar to the ones with Google described in Chapter 6. A February 2019 report12 by the Guardian found that when logged in to a new user account, with no friends or other activity, typing “vaccine” into Facebook’s search bar produced autocompletes such as “vaccine re-education,” “vaccine truth movement,” and “vaccine resistance movement” that push people into the world of anti-vax misinformation. Even if the user resisted these autocomplete temptations and simply searched for “vaccination,” the top twelve Facebook groups that came up were all anti-vaccination organizations, and eight of the top twelve Facebook pages that came up were anti-vaccination pages suffused with misinformation. Several months earlier, Facebook had launched a policy of deleting misinformation designed to provoke “violence or physical harm,” but it stated that anti-vax content does not violate this or any other Facebook policy. However, that changed in February 2021 when Facebook revised its policies and announced13 that it would start removing essentially all false claims about vaccines.
Better late than never: in August 2020, the progressive nonprofit organization Avaaz released a report14 on the state of global health misinformation, including anti-vax propaganda, on Facebook throughout the preceding year—and the picture it painted was not pretty. The report estimated that content on groups and pages sharing global health misinformation received nearly four billion views during that year. Within the timeframe of the report, views of this misinformative content peaked in April when the coronavirus pandemic was spiraling out of control, despite Facebook’s concerted efforts to fight COVID-19 misinformation. The report estimated that during this April peak, content from the top ten most popular health misinformation sites collected four times as many views as did content from the top ten leading authoritative sources such as the WHO and the CDC. It also found that certain “super spreader” pages were responsible for a large fraction of the misinformation, and that many of these super spreaders had origins in the anti-vax movement. One particular article falsely claiming that the American Medical Association was encouraging doctors to overcount COVID-19 deaths received over six million comments or likes and was viewed an estimated one hundred and sixty million times. In response to this report, Facebook said15 that from April to June it applied fact-checker warning labels to nearly one hundred million COVID-19 misinformation posts and removed seven million others that the company believed risked imminent harm.
Another problem on Facebook involving algorithms and related to misinformation has been the use of bots—fake accounts that can be commanded to behave in certain ways. Bots are often used to artificially seed initial popularity in specified Facebook groups or pages through likes and shares and comments; Facebook’s recommendation and newsfeed algorithms detect this high level of engagement and mistake it for authentic activity, causing the algorithms to promote the groups/pages to real users, which in turn drives their actual popularity. As you recall, the Epoch Times used this technique in its highly successful “Facebook strategy,” even though it manifestly violates the platform’s policies against inauthentic account ownership and activity. In September 2020, a data scientist named Sophie Zhang who had been working at Facebook on a team dedicated to catching and blocking bot accounts was fired. On her last day in the office, she posted16 a lengthy internal memo to the entire company describing the terrifying scope of the platform’s
bot problem and the corporate leadership’s reluctance to heed her repeated warnings to promptly and properly respond to this problem.
These internal presentations were part of company efforts—some initiating at the very top with Mark Zuckerberg—to understand how Facebook’s algorithms influenced user behavior in potentially harmful ways. But the Wall Street Journal revealed that the findings from these efforts were to a large extent ignored, and the proposals for addressing the problems were mostly dismissed or greatly reduced in scope. The main team looking at these issues wrote in a mid-2018 internal document that many of its proposed remediations were “antigrowth” and required the company to “take a moral stance.” A particularly delicate issue was that the team found that problematic behavior such as fake news, spam, clickbait, and inauthentic users came disproportionately from hyperpartisan users—and that there were larger networks aligned with the far right than the far left. This meant that even politically neutral efforts to reduce problematic behavior would, overall, affect conservative content more than liberal content. Facebook leadership did not want to alienate conservative users with actions that appeared to project a liberal bias, so the company’s handling of these matters has been highly constrained.
One specific remediation proposed by this internal Facebook research team was the following. Since Facebook’s algorithms were designed to maximize various user engagement metrics (likes, shares, comments, time spent logged on, etc.), users who are very active on the platform have a greater impact on the algorithms than do less active users. The team suggested a “Sparing Sharing” algorithmic adjustment to reduce the spread of content that was disproportionately driven by hyperactive users. The team believed this would help protect Facebook from coordinated manipulation efforts, but the Wall Street Journal revealed that senior Facebook executives were apprehensive, claiming this would unfairly hurt the platform’s most dedicated users. When the team and the senior executives couldn’t agree about this, the debate over it was eventually elevated all the way up to Zuckerberg who evidently said to implement it but only after cutting the proposed mitigation weighting by eighty percent—and he reportedly “signaled he was losing interest in the effort to recalibrate the platform in the name of social good” and asked the team “that they not bring him something like that again.”
absorbed into the conspiracy theories as coordinated efforts to keep people from learning the truth—which further galvanizes support for the movements. Rather than being static, these theories are more like viruses that constantly adapt and reconfigure themselves in order to persist and spread more rampantly. The supporters of these movements actively look for messaging that allows them to escape policy violations; often while doing so, they land on softer and more moderate ways to frame their ideology—and in the long run, this allows them to reach and convince an even wider audience. It’s almost like a bacterial infection that becomes more insidious and difficult to treat after resisting a partial course of antibiotics.
You can see all these factors at play in the 2020 election. An October 2020 report18 from the Election Integrity Partnership—a self-described “coalition of research entities” supporting “information exchange between the research community, election officials, government agencies, civil society organizations, and social media platforms”—looked into preemptive efforts to delegitimize the 2020 election. It noted that the rumor spreading across social media of a deep state coup attempt to steal the election from Trump was “worth examining […] to understand how it weaves together a wide swath of discrete events into an overarching meta-narrative,” and how this “meta-narrative becomes a scaffolding on which any future event can be hung: any new protest, or newly-discovered discarded ballot, is processed as further confirmatory evidence […] that there is a vast conspiracy to steal the election, and that the results will be illegitimate.” The report goes on to explain the psychological impact, and social media dynamics, of this framework: “What may previously have been isolated incidents with minimal social media traction may gain significant new weight when they are processed as additional evidence of an underlying conspiracy.” This is strikingly similar to what you saw in Chapter 4 with YouTube where an impressionable viewer feels that all signs point to the same hidden truth when the recommendation algorithm naively strings together videos from different users on the same conspiratorial themes.
One month after Twitter announced its anti-QAnon efforts, Facebook followed suit with an announcement21 that it would start “taking action against Facebook Pages, Groups and Instagram accounts tied to offline anarchist groups that support violent acts amidst protests, US-based militia organizations and QAnon.” The announcement admitted that while content directly advocating violence was already banned on the platform, there had been a growth of movements threatening public safety in slightly more oblique manners such as celebrating violent acts or harboring members who show themselves carrying weapons with the suggestion that they will use them. It went on to explain that Facebook would still “allow people to post content that supports these movements and groups,” but it would start to “restrict their ability to organize on our platform.” In other words, QAnon content would not be prohibited from individual Facebook users, but QAnon groups and pages on Facebook would face a new collection of restrictions. These restrictions included no longer suggesting QAnon groups and pages as recommendations for users to join/follow; decreasing the newsfeed ranking for posts from QAnon groups and pages; decreasing the search ranking for QAnon groups and pages and removing their names and QAnon hashtags from the autocomplete feature in Facebook’s search function; prohibiting paid ads and Facebook’s fundraising tools for QAnon; and removing QAnon groups and pages that discuss violence—even if the discussion relies on “veiled language and symbols particular to the movement.”
Two months later, Facebook posted an update to this announcement declaring that “we believe these efforts need to be strengthened when addressing QAnon.” As of October 6, 2020, Facebook would start removing all groups and pages “representing QAnon, even if they contain no violent content.” As justification for ramping up its actions against QAnon, Facebook noted examples such as the movement spreading misinformation about the west coast wildfires that did not fall under the umbrella of inciting or even discussing violence and yet caused real public harm by impeding local officials’ ability to
fight the fires. This update to the announcement admitted that enforcing this new ban would not be a trivial matter due to how quickly QAnon pivots its messaging and that Facebook expects “renewed attempts to evade our detection, both in behavior and content shared on our platform.” A few weeks later, the announcement was updated again to say that now when people search for terms related to QAnon, they would be directed to a counter-terrorism organization. Then in January 2021 the announcement was updated once again, this time to provide current tallies for the QAnon takedown effort: over three thousand Facebook QAnon pages, ten thousand groups, five hundred events, and eighteen thousand profiles had been removed.
conspiratorial child trafficking aspects of the movement. Many Facebook groups that were branded as anti-trafficking organizations but really were QAnon propaganda groups actually saw their growth rates spike after the Facebook crackdown. In short, the QAnon movement adapted to Facebook’s efforts in both technological and psychological ways to ensure it continued to prosper and spread in the new Facebook environment. With the 2020 election on the horizon and QAnon spreading pro-Trump misinformation about rampant voter fraud, Facebook evidently felt the need to ditch its original strategy and take a much stronger stance against the spiraling and spreading QAnon movement.
Now that the stage has been properly set, I’ll first look in more detail at how fake news spreads on social media, then this will inform the ensuing discussion of methods for curbing this spread.
25 Alexandre Bovet and Hernán Makse, “Influence of fake news in Twitter d tions10 no. 7 (2019): .
|
|---|
|
|---|
The studies discussed so far provide an informative view of the propagation and network structure of fake news on Twitter during the 2016 election, but what’s missing is how this translates to the individual-level experiences and activities of registered voters on Twitter. Fortunately, a paper30 was published in 2019 in the top academic journal Science that attempts to fill in this crucial missing piece of the story. The researchers linked a sample of public voter registration records to sixteen thousand Twitter accounts and collected the tweets from these users—let’s call them the “voters”—between August and December 2016. They collected lists of all the users following and followed by the voters, and by sampling the tweets posted by the latter—called “exposures” since these are the tweets the voters were potentially exposed to in their newsfeeds—the researchers were able to estimate the composition of the voters’ newsfeeds. They limited their investigation to exposures containing links to political content outside of Twitter, and they used these links to provide a discrete estimate of the left-right ideology of each voter. Here’s what they found.
news. The voters with higher concentrations of fake news in their newsfeed were far more likely to be conservative than liberal: people seeing at least five percent fake political links made up only two and a half percent of the liberal voters but over sixteen percent of the conservative voters. The older a voter was, the higher was the proportion of fake news they saw in their newsfeed. Voters in swing states had slightly higher proportions of fake news (corroborating studies discussed earlier), as did men and whites, but the size of these effects was quite small. Among the voters classified politically as extreme left, just under five percent ever shared a fake news link; the rate for left and center users was also just under five percent, whereas for politically right users it jumped up to just under twelve percent, and more than one in five extreme right users shared a fake news link during the five months of the study period.
32e snopes.com, politifact.com, factcheck.org, truthorfiction.com, hoax-slayer. com, and urbanlegends.about.com.
Here are some more details on these findings. On average, false rumors reached fifteen hundred people six times faster than true rumors did. Even when controlling for various differences between the users that originated rumor cascades, such as their number of followers and whether the account was verified by Twitter, false rumors were seventy percent more likely to get retweeted than true rumors. The largest rumor cascades reached around fifty thousand users when the rumor was false but only around two thousand users when it was true. There are two very different ways that information can spread and reach a large number of users on Twitter: a prominent influencer could tweet a story that many followers will directly retweet, or a less prominent user could tweet a story that gets retweeted by a small number of followers who then get it retweeted by some of their followers, etc. Even if a story reaches the same number of retweets in these two scenarios, the first is considered a shallow spread and the second a deep spread since it penetrates more deeply into the social network. It was found in this study that not only did false rumors ultimately reach larger audiences, but they did so with much greater depth: true rumors seldom chained together more than ten layers of retweets, whereas the most viral false rumors reached twenty layers of retweets—and they did so ten times as quickly as the true rumors reached their ten.
During the week of the 2020 election, there were three and a half million engagements (likes, shares, comments) on public Facebook posts referencing the phrase “Stop the Steal,” a slogan for the pro-Trump false claims of voter fraud.33 Six percent of these engagements occurred on the pages of four prominent influencers—Eric Trump and three conservative social media personalities—but the biggest super spreader here was Donald Trump: the twenty most engaged Facebook posts containing the word “election” were all from him, and they were all found to be false or misleading. In a four-week period surrounding November 3, President Trump and the other top twenty-five spreaders of voter fraud misinformation generated more than a quarter of the engagements on public voter fraud misinformation on Facebook that was indexed by an Avaaz investigation. Concerning the false claim that Dominion voting software deleted votes for Trump, more than a tenth of all engagements came from just seven posts. Many of the top spreaders of pro-Trump election misinformation on Facebook were also top spreaders of this same misinformation on Twitter—most notably, of course, President Trump himself.
The most detailed look so far at fake news on social media in the 2020 election is a lengthy report published34 in March 2021 by the Election Integrity Partnership that carefully tracked over six hundred pieces of election misinformation. One challenge the authors noted is that the social media app Parler is believed to have harbored a lot of election misinformation, but it does not make its data readily available, and so it is challenging for researchers to study content on Parler; similarly, Facebook’s private groups were hotbeds for misinformation, and the limited access they grant renders them difficult to study. Overall, the authors of this report found that misinformation in the 2020 election built up over a long period of time on all the social media platforms—despite efforts by the big ones to limit it—and very much exhibited the evolving meta-narrative structure discussed earlier in the context of QAnon. There were so many different forms and instances of false information about the election all pointing to a general—yet incorrect—feeling that the election would be, then was, stolen from President Trump that debunking any particular claim did little to slow down the movement, and sometimes doing so even brought more attention to the claim and further generated conspiratorial mistrust of the social media platform and/or fact-checking organization involved.
Recall that one part of the stolen election narrative was the false claim that Dominion voting machines deleted votes for Trump. Here’s what the report found:
36x News Faces
Other technical solutions are on the surface relatively simple but difficult to implement. In January 2020, Facebook announced38 a seemingly straightforward move: it would start banning deepfakes on its platform. More precisely, this was a ban on “misleading manipulated media,” meaning any video that “has been edited or synthesized—beyond adjustments for clarity or quality—in ways that aren’t apparent to an average person and would likely mislead someone into thinking that a subject of the video said words that they did not actually say” and that also “is the product of artificial intelligence or machine learning that merges, replaces or superimposes content onto a video, making it appear to be authentic.” This lengthy description really is intended to spell out that the ban is aimed at deepfakes that are used in a deceptive manner, but you saw in Chapter 3 that detection of deepfakes—whether algorithmic or manual—is, and likely always will be, quite challenging, so this ban will not
|
|
|---|---|
be easy to implement in practice. Also, note that this prohibition doesn’t cover shallowfakes of any kind, and for deepfakes it only covers words, not actions—so fake videos of public figures engaged in adulterous activity, for instance, are still allowed. Here’s another example of a policy decision that was far simpler to state than to implement: in October 2019, Twitter announced that it would start banning political advertising. This requires deciding exactly which ads count as political, which is no simple matter.
Many of the technical approaches that have been implemented, however, are quite sophisticated and rely extensively on machine learning. This is particularly true of Facebook, and it’s the topic I turn to next.
uses RoBERTa to help with identifying things like hate speech. This is a good example of a hybrid situation: Facebook trains a traditional supervised learning classifier on a collection of posts that have been manually labeled as hateful or not hateful, but rather than having the algorithm work directly with the text in these posts, it instead works with the text after the self-supervised RoBERTa has transformed the words in these posts into numerical vectors. Wherever there’s a situation in which the tech giants need a computer to be able to deal with the meaning of language, especially potentially subtle uses of language, you can assume that they now use a massive self-supervised language model like BERT or RoBERTa and that doing so has drastically improved performance over past approaches. That said, hate speech is generally much more self-apparent, even to a computer, than something like misinformation that depends heavily on context and background knowledge.
chapter and more extensively in the next chapter, machine learning still plays multiple important roles in that process.
As you recall from earlier in this chapter, bot activity on social media has long been a significant problem. As you also recall from this chapter, bot behavior tends to have quantitatively distinct patterns from human behavior—such as bursts of activity that far outpace what a human could achieve. When it comes to algorithmic detection, this powerful efficiency of bots is their own undoing: not only can detection algorithms look for direct signs of bots in metadata, but the algorithms can also look for behavioral differences. It is relatively easy to program a bot to share articles and even to write simple human-sounding posts and comments, but to fly under the radar, one needs to ensure that the bot does this at the approximate frequency and scope of a human user. And in the case of Facebook, there’s another important factor involved: the friendship network. What has proven most challenging when it comes to creating bots that simulate human behavior is developing a Facebook account with a realistic-looking network of friends40—and doing this in large enough numbers for an army of bots to have a significant impact.
In November 2020, Facebook also announced43 a new machine learning approach for ordering the queue of posts that are flagged for review by human moderators. Previously, posts flagged for human review for potentially violating Facebook’s policies (which includes both posts flagged by users and posts that triggered the algorithmic detection system but didn’t score above the threshold for automatic removal) were reviewed by human moderators mostly in the order in which they were flagged. The new approach uses machine learning to determine the priority of posts in the queue so that the most urgent and damaging ones are addressed first. The main factors the algorithm considers are virality, severity, and likelihood of violating a policy—but the ways these are measured and weighed against each other were not revealed, outside of saying that real-world harm is considered the most important.
Twitter’s Bot Detection
44 September 28, 2017:
|
|---|
How Algorithms Could Help
In this section, I’ll start first with fake news mitigation methods based on broader structural ideas for reengineering social media networks; then I’ll turn to more down-to-earth methods that rely on the way fake news spreads through social networks as they currently operate.
A frequently discussed issue with social media is “filter bubbles,” the idea that people get funneled into homogeneous networks of like-minded users sharing content that tends to reinforce preexisting viewpoints; this can create a more divisive, polarized society, and in extreme cases it may even lead to people living in different perceived realities from each other. Some researchers have proposed algorithmic methods for bursting, or at least mitigating, these social media filter bubbles. One particular approach48 is to first assign a vector of numbers to each user and to each social media post (or at least each news link shared) that provides a quantitative measure of various dimensions such as political alignment. A filter bubble is reflected by a collection of users whose vectors are similar to each other and who tend to post content with similar vectors as well. With this setup, researchers designed an algorithm to prioritize diverse content to a select group of users who are deemed likely to share it and help it spread across the social network, penetrating filter bubbles as it goes. The researchers’ particular strategy for selecting the users to seed this spread was found to be three times more effective at increasing the overall diversity of newsfeeds in the network than a simpler approach of just
46 Shannon Bond, “Can
targeting diverse content to the most well-connected users. The researchers do admit that this diversity-oriented newsfeed algorithm would not maximize engagement the way the current social media algorithms do—and I cannot help but wondering how challenging it would be to assign these numerical ideology vectors in the real world.
Another method researchers have suggested—which, like the previous method, involves a fairly significant reenvisioning of how social networks should operate in order to reduce polarization—is based around the idea of decentralization. Many social networks naturally form a collection of hubs around highly influential users. If you think of each of these influencers as the center of a bicycle wheel shape with spokes emanating out to the followers, the network will look like a bunch of bicycle wheels with some but not many connections between the different wheels. A consequence of this network structure is that ideas and viewpoints tend to emerge from the limited number of central influencers and percolate outward, but they have difficulty crossing from one bicycle wheel to the next. In the case of Donald Trump on Twitter, there was a massive bicycle wheel with him at the center that encompassed much of the Republican user base on the platform and possibly impeded the flow of diverse perspectives within this user base. The general idea of decentralizing a social network is to reengineer it so that these bicycle-shaped circles of influence are less likely to emerge. This can also be seen as creating egalitarian networks in which users have less heavily imbalanced influence on each other.
centralized networks at present far outpaces research on how to prevent social media networks from becoming centralized—without explicitly deciding who people should be friends with and/or limiting the number of followers they can have.
Fake News Detection
|
|---|
54 Piccardi, and Stefano Ceri, “A multi-layer approach to dtion detection in US and Italian news spreading on Twitter,” EPJ9 no. 35 (2020):
57 Tessa Lyons, “Hard Questions: What’s 23, 2018: .
In the ramp-up to the 2020 election, Section 230 caught the skeptical eye of people of all political persuasions: many on the left said it allows Google and the social media companies to serve up harmful extremist content and misinformation without any culpability, while many on the right said it allows these companies to discriminately censor conservatives and stifle free speech. In the spring of 2020, Twitter started labeling false and misleading tweets by President Trump about voter fraud; Trump responded by calling for a total revocation of Section 230. Just prior to that, Joe Biden also called for the revocation of Section 230—but his reason was that it allows companies like Facebook to “propagate falsehoods they know to be false.”58 Even Mark Zuckerberg has expressed support for some revisions to Section 230, though Senator Wyden—one of the co-authors of the original 1996 law—openly questioned Zuckerberg’s intentions in this regard:59 “He made his money, and now he wants to pull up the ladder behind him. The fact that Facebook, of all companies, is calling for changes to 230 makes you say, ‘Wait a second’.”
58 Emily Bazelon, “The Probl er 13, 2020: .
Concluding Thoughts
In 2018, Facebook announced plans for a new Oversight Board to help deliberate and adjudicate matters concerning the platform’s influence on public discourse. The board, often called “Facebook’s Supreme Court,” comprises a range of international experts—from top scholars in media studies, law, and public policy to leaders of human rights organizations and think tanks, and even a Nobel Peace Prize winner and a former prime minister of Denmark. As an example of the board’s activities, it was tasked with determining whether the indefinite suspension of Trump’s Facebook account was justified and whether it should continue.
However, it was recently pointed out63 by two scholars at Columbia University’s free speech institute that the board is so limited in scope—it focuses almost exclusively on individual instances of content removal—that it is in some sense a façade. They note that specific questions of content moderation are important, but far more consequential are the decisions the company makes about the design of its platform and the algorithms that power it: “[Facebook’s] ranking algorithms determine which content appears at the top of users’ news feeds. Its decisions about what types of content can be shared, and how, help determine which ideas gain traction. [...] The board has effectively been directed to take the architecture of Facebook’s platform as a given.”
In other words, the board provides the public impression of external regulation and a dedication to mitigating ill effects on society, but the problem of harmful content spreading on social media and the question of how to moderate it run much deeper than decisions about individual pieces of content. The real discussion must involve investigations into, and possibly a vast rethinking of, the current algorithmic approach to maximizing user engagement. You have seen in this chapter that there are already many insightful investigations into algorithmic amplification of harmful content and even some promising ideas for redesigning the structure of social networks to counteract this. These questions of algorithmic design are central to the way forward, but Facebook has conveniently left all decisions concerning them in its own hands and out of the purview of its Supreme Court.
messages, sometimes to enormous audiences, and a live video feed clearly poses numerous vexing technical obstacles when it comes to moderation.
Karen Hao, a technology journalist who has for several years written articles exploring bias in—and unintended consequences of—machine learning algorithms, published an article65 in March 2021 that she described66 as “The hardest and most complex story I’ve ever worked on.” The title of the article? “How Facebook Got Addicted to Spreading Misinformation.” She said that “Reporting this thoroughly convinced me that self-regulation does not, cannot work” and that the article is “not about corrupt people do corrupt things […] it’s about good people genuinely trying to do the right thing. But they’re trapped in a rotten system trying their best to push the status quo that won’t budge.”
|
|---|
Tools for Truth
Fact-Checking Resources for Journalists and You
In this section, I describe a handful of online tools that can assist with fact-checking in various ways and that involve machine learning algorithms to varying extents. This is only a sample of what’s out there, and some useful products not discussed here helpfully combine multiple approaches into a single user-friendly tool. My main goal here is not to provide a comprehensive list of software packages, for such a list would surely become outdated quite
© Noah Giansiracusa 2021
event Fake News,
The BERT encodings are also used to estimate whether each identified claim matches one in the archive of previously fact-checked claims. Unsurprisingly, most claims don’t just appear once, they appear in many different locations and guises, so it saves an enormous amount of time to check each claim in substance once rather than checking every instance and minor variation of it. Machine learning is also used to identify the claims that most urgently require fact-checking each day, based on current events and other factors. This hybrid approach in which machine learning assists human reviewers, rather than replacing them, is very sensible; it is, as you may recall from the previous chapter, similar in spirit to Facebook’s approach to content moderation. Full Fact directly states on its web page that “Humans aren’t going anywhere anytime soon—and nor would we want them to be.”
Logically
poetically puts it, to “supplement human intelligence, not supplant it.” In broad strokes, the fact-checking service works as follows. First, a user submits a link to an article or post, and Logically uses machine learning to identify the key claims in it, similar to what we saw with Full Fact except that here the claims are not classified—instead, the user is prompted to select one of the identified claims to focus on. Next, machine learning is used to search for evidence and previous fact-checks by Logically related to the selected claim. If a close fact-check match is found, then no human intervention is needed; otherwise, the claim is sent to the human fact-checking team, and a full report is returned (and added to the database of completed fact-checks) once it is ready.
Mike Tamir, head of data science at Uber’s self-driving car division, produced a free tool called FakerFact5 for analyzing passages of text that users either paste in or provide a link to. While sometimes billed as an online fact-checker,
3.
However, when teaching a data science class one semester, I assigned my students to experiment with FakerFact, and we found the results quite unreliable—and often laughable. One student noted that the US Constitution was labeled opinion and satire. When I pasted in the first chapter of this book, FakerFact said “this one sounds silly” and also deemed it opinion and satire. Just now, I fed FakerFact the first link on the New York Times, which was an article6 about a COVID vaccine trial, and it was labeled sensational. Curiously, some of the passages highlighted as influential for this decision actually made sense: “Miami-Dade County, which includes Miami Beach, has recently endured one of the nation’s worst outbreaks, and more than 32,000 Floridians have died from the virus, an unthinkable cost that the state’s leaders rarely acknowledge.” But others don’t seem the least bit sensational: “Two-thirds of participants were given the vaccine, with doses spaced four weeks apart, and the rest received a saline placebo.” Let’s hope Uber’s self-driving cars are a little more accurate than this text analysis tool.
Waterloo’s Stance Detection
They used a data set of fifty thousand articles to train a deep learning classification algorithm that looks at the body of each article and the headline of the article and estimates whether the body agrees with the headline, disagrees with it, discusses it without taking a stance, or is unrelated to the headline. Their algorithm scored a very respectable ninety percent accuracy, which was considerably higher than previous attempts by other researchers. The main insight in their work was to start with Facebook’s massive pre-trained deep learning algorithm RoBERTa and then do additional focused training to fine-tune the algorithm for the specific task at hand. This general process of fine-tuning a massive pre-trained deep learning algorithm is called “transfer learning,” and it has been an extremely successful method in AI, so it is not at all surprising that this is the right way to go when it comes to stance detection—it just wasn’t possible before BERT and RoBERTa came out.
I don’t believe a prototype of this Waterloo stance detection method is publicly available yet, and to really be effective we need progress on the document retrieval step as well. Nonetheless, it is promising work that may well find itself in user-friendly software in the near future.
The researchers’ framework in principle applies to all kinds of medical and scientific assertions—not just COVID-related ones—but since training their algorithm involves careful manual work with the data, they decided to launch this tool initially in a limited setting and scope. The reader is cautioned, and the researchers readily admit, that this tool is largely intended to show what is possible and what challenges remain, rather than to be blindly trusted. They tested a few dozen COVID-19 assertions and found the algorithm returned relevant papers and correctly identified their stance about two-thirds of the time. While this tool should not replace finding and reading papers manually, it might still help with a quick first-pass assessment of medical claims.
8.
Another massive knowledge graph is being assembled by a startup called Diffbot10 that has been scraping the entire public Web for facts. (Diffbot, Google, and Microsoft are supposedly the only three companies known to crawl the entire public Web for any purpose.) It has been adding over a hundred million entities per month. Diffbot offers a handful of services based on its knowledge graph, and the CEO said11 that he eventually wants to use it to power a “universal factoid question answering system.” It is curious that he used the term “factoid” here, which in general usage can refer to either a snippet of factual information or a statement that is repeated so often that it becomes accepted as common knowledge whether or not it is actually true. I suspect Diffbot is actually capturing the latter, because it draws from all of the Web rather than just using certain vetted sources as Google does. And this worries me. We all know not to trust everything we read on the Web, so why should we trust an algorithm that gathered all of its knowledge by reading the Web? I’m optimistic that the developers at Diffbot have attempted to include only accurate information in their knowledge graph, but I’m far less optimistic that they’ve been successful in this regard. This project strikes me as valuable
9 Danny Sullivan, “A reintroduction
Twitter Bot Detection
Bot Sentinel12 is essentially a public-facing version of the bot detection systems used internally by Twitter to automatically detect bot accounts. It uses machine learning techniques (along the lines discussed in the previous chapter) to detect bots on Twitter and lets users freely explore a database of these detected accounts and track their activity. Botometer13 is a related free tool that lets you type in a specific Twitter username and then applies machine learning classification methods (again, like the ones discussed last chapter) to estimate whether that account is a bot or a human; it also provides bot-versus-human estimates on all the followers of the specified account. BotSlayer14 is a free browser extension that helps users detect and track hashtags and other information spreading across Twitter in a coordinated manner suggestive of a bot campaign.
13
14.
Additional Tools
Hoaxy15 is a free keyword search tool that builds interactive network visualizations for the diffusion across Twitter of claims that have been fact-checked by one of the main fact-checking sites. TheFactual16 is a free mobile app and browser extension that uses machine learning to estimate the quality of news articles; it does this by combining a few different estimated quantities pertaining to the article, such as a reputation score for the journalist(s), an NEQ-type score for the publisher, and a measure of how opinionated the article’s language is.
15.
16.
|
|---|
17 Pandu Nayak, “Our latest investmen
18
fact-checking organizations such as PolitiFact offer fact-checks of individual YouTube videos, though you have to access these from PolitiFact’s website21 rather than directly through YouTube.
24 Daniel Funke, “Sno
25
This chapter opened with a list of publicly available tools that use machine learning to assist with fact-checking tasks, and it closed with a brief discussion of the fact-checking tools and activities at Google, YouTube, Facebook, and Twitter. Now you can go forth and do your own part in the fight against fake news!
Index
| B |
|
|---|
Backdating, 141
BERT, 202
Bharatiya Janata Party (BJP), 53
Bidirectional Encoder Representations from Transformers (BERT), 142
Deception detection, 113
Deception signals, 114
Discern Science International (DSI), 114
Dominion voting machine, 199
Fact-checking tools
Facebook, 227,228
Google, 225,226
online tools
Diffbot, 223
Factual, 225
FakerFact, 219,220 Full Fact, 218
Hoaxy, 225
|
|---|
Epoch Times, 178
External fact-checking organizations, 202
G
Gateway Pundit, 199
Generative adversarial networks (GANs), 41 Global Disinformation Index (GDI), 156,157
| Index |
|---|
Google Maps, 123
automated systems, 125 businesses, 124
content moderation, 126
K
Known terrorist sympathizers, 18
|
|---|
Google’s ad
direct method, 153
financial incentive, 153 indirect method, 153 racism, 159–161
2017 report, 154–156 2019 report, 156
2021 report, 157,158 revenue, 152
Modern polygraph, 104
Movimento Brasil Livre (MBL), 78,79
H
Housing and Urban
Development (HUD), 165
| T |
|---|
Q
QAnon movement, 186–189R
Recommendation/ranking algorithm, 70,180 Red-pilling, 69
Reenactment, 50
Reinforcement learning, 74,81
RoBERTa, 201,209
Rumors, 195Tweeting/retweeting network, 191
Twitter, 228
bot accounts, 192,193
bot detection, 204
bot-driven, 192
2016 election, 191
2020 election, 197
fake news exposure, 194,195
fake news links, 191
geographic distribution, 193
human activity, 192
individual-level experiences, 194 left-leaning news, 191
left-wing news, 192
lies, 195
machine learning, 196
misinformation, 198
political advertising, 201
political orientation, 191
replies and retweets, 196
rumors, 195–197
supervised learning algorithm, 192 tools, 196
traditional news, 191
voters, 194
Vector representation, 143 Video-specific predictors, 73
|
|---|
| Whole Post Integrity Embeddings (WPIE), 202 | Index |
|---|
Brazil, 77
conspiracy theory, 80,81
far-right content, 79
political influence, 78,79
