Inside the fight to reclaim AI from Big Tech’s control

Timnit Gebru never thought a scientific paper would cause her so much trouble. 

In 2020, as the co-lead of Google’s ethical AI team, Gebru had reached out to Emily Bender, a linguistics professor at the University of Washington, and the two decided to collaborate on research about the troubling direction of artificial intelligence. Gebru wanted to identify the risks posed by large language models, one of the most stunning recent breakthroughs in AI research. The models are algorithms trained on staggering amounts of text. Under the right conditions, they can compose what look like convincing passages of prose.

For a few years, tech companies had been racing to build bigger versions and integrate them into consumer products. Google, which invented the technique, was already using one to improve the relevance of search results. OpenAI announced the largest one, called GPT-3, in June 2020 and licensed it exclusively to Microsoft a few months later.

Gebru worried about how fast the technology was being deployed. In the paper she wound up writing with Bender and five others, she detailed the possible dangers. The models were enormously costly to create—both environmentally (they require huge amounts of computational power) and financially; they were often trained on the toxic and abusive language of the internet; and they’d come to dominate research in language AI, elbowing out promising alternatives. 

Like other existing AI techniques, the models don’t actually understand language. But because they can manipulate it to retrieve text-based information for users or generate natural conversation, they can be packaged into products and services that make tech companies lots of money.

That November, Gebru submitted the paper to a conference. Soon after, Google executives asked her to retract it, and when she refused, they fired her. Two months later, they also fired her coauthor Margaret Mitchell, the other leader of the ethical AI team.

The dismantling of that team sparked one of the largest controversies within the AI world in recent memory. Defenders of Google argued that the company has the right to supervise its own researchers. But for many others, it solidified fears about the degree of control that tech giants now have over the field. Big Tech is now the primary employer and funder of AI researchers, including, somewhat ironically, many of those who assess its social impacts.

Among the world’s richest and most powerful companies, Google, Facebook, Amazon, Microsoft, and Apple have made AI core parts of their business. Advances over the last decade, particularly in an AI technique called deep learning, have allowed them to monitor users’ behavior; recommend news, information, and products to them; and most of all, target them with ads. Last year Google’s advertising apparatus generated over $140 billion in revenue. Facebook’s generated $84 billion.

The companies have invested heavily in the technology that has brought them such vast wealth. Google’s parent company, Alphabet, acquired the London-based AI lab DeepMind for $600 million in 2014 and spends hundreds of millions a year to support its research. Microsoft signed a $1 billion deal with OpenAI in 2019 for commercialization rights to its algorithms.

At the same time, tech giants have become large investors in university-based AI research, heavily influencing its scientific priorities. Over the years, more and more ambitious scientists have transitioned to working for tech giants full time or adopted a dual affiliation. From 2018 to 2019, 58% of the most cited papers at the top two AI conferences had at least one author affiliated with a tech giant, compared with only 11% a decade earlier, according to a study by researchers in the Radical AI Network, a group that seeks to challenge power dynamics in AI.

The problem is that the corporate agenda for AI has focused on techniques with commercial potential, largely ignoring research that could help address challenges like economic inequality and climate change. In fact, it has made these challenges worse. The drive to automate tasks has cost jobs and led to the rise of tedious labor like data cleaning and content moderation. The push to create ever larger models has caused AI’s energy consumption to explode. Deep learning has also created a culture in which our data is constantly scraped, often without consent, to train products like facial recognition systems. And recommendation algorithms have exacerbated political polarization, while large language models have failed to clean up misinformation. 

It’s this situation that Gebru and a growing movement of like-minded scholars want to change. Over the last five years, they’ve sought to shift the field’s priorities away from simply enriching tech companies, by expanding who gets to participate in developing the technology. Their goal is not only to mitigate the harms caused by existing systems but to create a new, more equitable and democratic AI. 

“Hello from Timnit”

In December 2015, Gebru sat down to pen an open letter. Halfway through her PhD at Stanford, she’d attended the Neural Information Processing Systems conference, the largest annual AI research gathering. Of the more than 3,700 researchers there, Gebru counted only five who were Black.

Once a small meeting about a niche academic subject, NeurIPS (as it’s now known) was quickly becoming the biggest annual AI job bonanza. The world’s wealthiest companies were coming to show off demos, throw extravagant parties, and write hefty checks for the rarest people in Silicon Valley: skillful AI researchers.

That year Elon Musk arrived to announce the nonprofit venture OpenAI. He, Y Combinator’s then president Sam Altman, and PayPal cofounder Peter Thiel had put up $1 billion to solve what they believed to be an existential problem: the prospect that a superintelligence could one day take over the world. Their solution: build an even better superintelligence. Of the 14 advisors or technical team members he anointed, 11 were white men.


While Musk was being lionized, Gebru was dealing with humiliation and harassment. At a conference party, a group of drunk guys in Google Research T-shirts circled her and subjected her to unwanted hugs, a kiss on the cheek, and a photo.

Gebru typed out a scathing critique of what she had observed: the spectacle, the cult-like worship of AI celebrities, and most of all, the overwhelming homogeneity. This boy’s club culture, she wrote, had already pushed talented women out of the field. It was also leading the entire community toward a dangerously narrow conception of artificial intelligence and its impact on the world.

Google had already deployed a computer-vision algorithm that classified Black people as gorillas, she noted. And the increasing sophistication of unmanned drones was putting the US military on a path toward lethal autonomous weapons. But there was no mention of these issues in Musk’s grand plan to stop AI from taking over the world in some theoretical future scenario. “We don’t have to project into the future to see AI’s potential adverse effects,” Gebru wrote. “It is already happening.”

Gebru never published her reflection. But she realized that something needed to change. On January 28, 2016, she sent an email with the subject line “Hello from Timnit” to five other Black AI researchers. “I’ve always been sad by the lack of color in AI,” she wrote. “But now I have seen 5 of you 🙂 and thought that it would be cool if we started a black in AI group or at least know of each other.”

The email prompted a discussion. What was it about being Black that informed their research? For Gebru, her work was very much a product of her identity; for others, it was not. But after meeting they agreed: If AI was going to play a bigger role in society, they needed more Black researchers. Otherwise, the field would produce weaker science—and its adverse consequences could get far worse.

A profit-driven agenda

As Black in AI was just beginning to coalesce, AI was hitting its commercial stride. That year, 2016, tech giants spent an estimated $20 to $30 billion on developing the technology, according to the McKinsey Global Institute.

Heated by corporate investment, the field warped. Thousands more researchers began studying AI, but they mostly wanted to work on deep-learning algorithms, such as the ones behind large language models. “As a young PhD student who wants to get a job at a tech company, you realize that tech companies are all about deep learning,” says Suresh Venkatasubramanian, a computer science professor who now serves at the White House Office of Science and Technology Policy. “So you shift all your research to deep learning. Then the next PhD student coming in looks around and says, ‘Everyone’s doing deep learning. I should probably do it too.’”

But deep learning isn’t the only technique in the field. Before its boom, there was a different AI approach known as symbolic reasoning. Whereas deep learning uses massive amounts of data to teach algorithms about meaningful relationships in information, symbolic reasoning focuses on explicitly encoding knowledge and logic based on human expertise. 

Some researchers now believe those techniques should be combined. The hybrid approach would make AI more efficient in its use of data and energy, and give it the knowledge and reasoning abilities of an expert as well as the capacity to update itself with new information. But companies have little incentive to explore alternative approaches when the surest way to maximize their profits is to build ever bigger models. 

In their paper, Gebru and Bender alluded to a basic cost of this tendency to stick with deep learning: the more advanced AI systems we need are not being developed, and similar problems keep recurring. Facebook, for example, relies heavily on large language models for automated content moderation. But without really understanding the meaning behind text, those models often fail. They regularly take down innocuous posts while giving hate speech and misinformation a pass.

AI-based facial recognition systems suffer from the same issue. They’re trained on massive amounts of data but see only pixel patterns—they do not have a grasp of visual concepts like eyes, mouths, and noses. That can trip these systems up when they’re used on individuals with a different skin tone from the people they were shown during training. Nonetheless, Amazon and other companies have sold these systems to law enforcement. In the US, they have caused three known cases of police jailing the wrong person—all Black men—in the last year.

For years, many in the AI community largely acquiesced to Big Tech’s role in shaping the development and impact of these technologies. While some expressed discomfort with the corporate takeover, many more welcomed the industry’s deep well of funding. 

But as the shortcomings of today’s AI have become more evident—both its failure to solve social problems and the mounting examples that it can exacerbate them—faith in Big Tech has weakened. Google’s ousting of Gebru and Mitchell further stoked the discussion by revealing just how much companies will prioritize profit over self-policing.

In the immediate aftermath, over 2,600 Google employees and 4,300 others signed a petition denouncing Gebru’s dismissal as “unprecedented research censorship.” Half a year later, research groups are still rejecting the company’s funding, researchers refuse to participate in its conference workshops, and employees are leaving in protest.

Unlike five years ago, when Gebru began raising these questions, there’s now a well-established movement questioning what AI should be and who it should serve. This isn’t a coincidence. It’s very much a product of Gebru’s own initiative, which began with the simple act of inviting more Black researchers into the field.

It takes a conference

In December 2017, the new Black in AI group hosted its first workshop at NeurIPS. While organizing the workshop, Gebru approached Joy Buolamwini, an MIT Media Lab researcher who was studying commercial facial recognition systems for possible bias. Buolamwini had begun testing these systems after one failed to detect her own face unless she donned a white mask. She submitted her preliminary results to the workshop.

Deborah Raji, then an undergraduate researcher, was another early participant. Raji was appalled by the culture she’d observed at NeurIPS. The workshop became her respite. “To go from four or five days of that to a full day of people that look like me talking about succeeding in this space—it was such important encouragement for me,” she says.

Buolamwini, Raji, and Gebru would go on to work together on a pair of groundbreaking studies about discriminatory computer-vision systems. Buolamwini and Gebru coauthored Gender Shades, which showed that the facial recognition systems sold by Microsoft, IBM, and Chinese tech giant Megvii had remarkably high failure rates on Black women despite near-perfect performance on white men. Raji and Buolamwini then collaborated on a follow-up called Actionable Auditing, which found the same to be true for Amazon’s Rekognition. In 2020, Amazon would agree to a one-year moratorium on police sales of its product, in part because of that work.

At the very first Black in AI workshop, though, these successes were distant possibilities. There was no agenda other than to build community and produce research based on their sorely lacking perspectives. Many onlookers didn’t understand why such a group needed to exist. Gebru remembers dismissive comments from some in the AI community. But for others, Black in AI pointed a new way forward.

This was true for William Agnew and Raphael Gontijo Lopes, both queer men conducting research in computer science, who realized they could form a Queer in AI group. (Other groups that took shape include Latinx in AI, {Dis}Ability in AI, and Muslim in ML.) For Agnew, in particular, having such a community felt like an urgent need. “It was hard to even imagine myself having a happy life,” he says, reflecting on the lack of queer role models in the field. “There’s Turing, but he committed suicide. So that’s depressing. And the queer part of him is just ignored.”

Not all affinity group members see a connection between their identity and their research. Still, each group has established particular expertise. Black in AI has become the intellectual center for exposing algorithmic discrimination, critiquing surveillance, and developing data-efficient AI techniques. Queer in AI has become a center for contesting the ways algorithms infringe on people’s privacy and classify them into bounded categories by default.

Venkatasubramanian and Gebru also helped create the Fairness, Accountability, and Transparency (FAccT) conference to create a forum for research on the social and political implications of AI. Ideas and draft papers discussed at NeurIPS affinity group workshops often become the basis for papers published at FAccT, which then showcases that research to broader audiences.

It was after Buolamwini presented at the first Black in AI workshop, for example, that FAccT published Gender Shades. Along with Actionable Auditing, it then fueled several major education and advocacy campaigns to limit government use of facial recognition. When Amazon attempted to undermine the legitimacy of Buolamwini’s and Raji’s research, dozens of AI researchers and civil society organizations banded together to defend them, foreshadowing what they would later do for Gebru. Those efforts eventually contributed to Amazon’s moratorium, which in May the company announced it would extend indefinitely.

The research also set off a cascade of regulation. More than a dozen cities have banned police use of facial recognition, and Massachusetts now requires police to get a judge’s permission to use it. Both the US and the European Commission have proposed additional regulation.

“First we had to just be there,” says Gebru. “And at some point, what Black in AI says starts to become important. And what all of these groups together say becomes important. You have to listen to us now.”

Follow the money

After Gebru and Mitchell’s firing, the field is grappling anew with an age-old question: Is it possible to change the status quo while working from within? Gebru still believes working with tech giants is the best way to identify the problems. But she also believes that corporate researchers need stronger legal protections. If they see risky practices, they should be able to publicly share their observations without jeopardizing their careers.

Then there’s the question of funding. Many researchers want more investment from the US government to support work that is critical of commercial AI development and advances the public welfare. Last year, it committed a measly $1 billion to non-defense-related AI research. The Biden administration is now asking Congress to invest an additional $180 billion in emerging technologies, with AI as a top priority.

Such funding could help people like Rediet Abebe, an assistant professor of computer science at the University of California, Berkeley. Abebe came into AI with ideas of using it to advance social equity. But when she started her PhD at Cornell, no one was focused on doing such research. 

In the fall of 2016, as a PhD student, she began a small Cornell reading group with a fellow graduate student to study topics like housing instability, health-care access, and inequality. She then embarked on a new project to see whether her computational skills could support efforts to alleviate poverty.

Eventually, she found the Poverty Tracker study, a detailed data set on the financial shocks—unexpected expenses like medical bills or parking tickets—experienced by more than 2,000 New York families. Over many conversations with the study’s authors, social workers, and nonprofits serving marginalized communities, she learned about their needs and told them how she could help. Abebe then developed a model that showed how the frequency and type of shocks affected a family’s economic status. 

Five years later, the project is still ongoing. She’s now collaborating with nonprofits to improve her model and working with policymakers through the California Policy Lab to use it as a tool for preventing homelessness. Her reading group has also since grown into a 2,000-person community and is holding its inaugural conference later this year. 

Abebe sees it as a way to incentivize more researchers to flip the norms of AI. While traditional computer science conferences emphasize advancing computational techniques for the sake of doing so, the new one will publish work that first seeks to deeply understand a social issue. The work is no less technical, but it builds the foundation for more socially meaningful AI to emerge. 

“These changes that we’re fighting for—it’s not just for marginalized groups,” she says. “It’s actually for everyone.”

Anti-vaxxers are weaponizing Yelp to punish bars that require vaccine proof

On the first hot weekend of the summer, Richard Knapp put up a sign outside Mother’s Ruin, a bar tucked in Manhattan’s SoHo neighborhood. It had two arrows: one pointing vaccinated people indoors, another pointing unvaccinated people outdoors.

The Instagram post showing the sign (above) quickly went viral among European anti-vaxxers on Reddit. “We started receiving hate mail through the Google portal,” Knapp says, estimating he’d received about a “few dozen” emails: “I’ve been called a Nazi and a communist in the same sentence. People hope that our bar burns down. It’s a name and shame campaign.” It wasn’t just the emails. Soon, his bar started receiving multiple one-star reviews on Yelp and Google Reviews from accounts as far away as Europe. 

Spamming review portals with negative ratings is not a new phenomenon. Throughout the pandemic, the tactic has also been deployed to attack bars and restaurants that enforced mask-wearing for safety. As pandemic restrictions have lifted, businesses like Mother’s Ruin have sought to ensure that safety by requiring proof of vaccination using state-sponsored apps like New York’s Excelsior Pass, vaccine passports, or simply flashing vaccine cards at the door — practices that have instigated a second surge of spam reviews.

These spam one-star reviews can be extremely damaging. The default mode for viewing reviews is in chronological order, from newest to oldest, which means a spam attack places fake reviews up top, making the most recent reviews that much more influential if you’re the victim of a concerted campaign. 

While some companies have gotten around this issue on their own sites by verifying that reviewers are actual customers by reaching out to them via email and matching them with what they have on file, industry-leading platforms like Yelp and Google let anyone rate and review a business.

In April, Marshall Smith instituted what may have been the US’s first policy requiring patrons to prove they were fully vaccinated against coronavirus at Bar Max in Denver. He didn’t think it would be a big deal to ask customers to show their vaccination cards at the door. “I didn’t consider the politics, and perhaps that was naive on my part,” he says.

Within days, his bar was slammed with one-star reviews on Google that took his average rating from 4.6 out of 5 stars to 4.

“We were in the top 10 best reviewed craft cocktail bars in Denver [pre-pandemic],” he says. “It might not sound significant but if you drop out of the first page of results, it’s a big deal: you’re out of top 10 lists, listicle mentions. We don’t do a lot of advertising because people look at our reviews. We’ve built six years of good reviews that’s been chiseled away over a matter of months.”

These reviews don’t stay permanently in a business’s history. Yelp roots out spam, though the company “does not tell anybody [how its spam detection works]” says Bing Liu, a professor of computer science at the University of Illinois at Chicago. Liu was a co-author in 2013 of a paper that attempted to replicate Yelp’s methods, finding that that company most likely used keywords to root out possible spammers.

Smith’s Yelp reviews were shut down after the sudden flurry of activity on its page, which the company labels “unusual activity alerts,” a stopgap measure for both the business and Yelp to filter through a flood of reviews and pick out which are spam and which aren’t. Noorie Malik, Yelp’s vice president of user operations, said Yelp has a “team of moderators” that investigate pages that get an unusual amount of traffic. “After we’ve seen activity dramatically decrease or stop, we will then clean up the page so that only firsthand consumer experiences are reflected,” she said in a statement.

It’s a practice that Yelp has had to deploy more often over the course of the pandemic: According to Yelp’s 2020 Trust & Safety Report, the company saw a 206% increase over 2019 levels in unusual activity alerts. “Since January 2021, we’ve placed more than 15 unusual activity alerts on business pages related to a business’s stance on covid-19 vaccinations,” said Malik.

The majority of those cases have been since May, like the gay bar C.C. Attles in Seattle, which got an alert from Yelp after it made patrons show proof of vaccination at the door. Earlier this month, Moe’s Cantina in Chicago’s River North neighborhood got spammed after it attempted to isolate vaccinated customers from unvaccinated ones.

Spamming a business with one-star reviews is not a new tactic. In fact, perhaps the best-known case is Colorado’s Masterpiece bakery, which won a 2018 Supreme Court battle for refusing to make a wedding cake for a same-sex couple, after which it got pummeled by one-star reviews. “People are still writing fake reviews. People will always write fake reviews,” Liu says.

But he adds that today’s online audience know that platforms use algorithms to detect and flag problematic words, so bad actors can mask their grievances by blaming poor restaurant service like a more typical negative review to ensure the rating stays up — and counts.

That seems to have been the case with Knapp’s bar. His Yelp review included comments like “There was hair in my food” or alleged cockroach sightings. “Really ridiculous, fantastic shit,” Knapp says. “If you looked at previous reviews, you would understand immediately that this doesn’t make sense.” 

Liu also says there is a limit to how much Yelp can improve their spam detection, since natural language — or the way we speak, read, and write — “is very tough for computer systems to detect.” 

But Liu doesn’t think putting a human being in charge of figuring out which reviews are spam or not will solve the problem. “Human beings can’t do it,” he says. “Some people might get it right, some people might get it wrong. I have fake reviews on my webpage and even I can’t tell which are real or not.”

You might notice that I’ve only mentioned Yelp reviews thus far, despite the fact that Google reviews — which appear in the business description box on the right side of the Google search results page under “reviews” — is arguably more influential. That’s because Google’s review operations are, frankly, even more mysterious. 

While businesses I spoke to said Yelp worked with them on identifying spam reviews, none of them had any luck with contacting Google’s team. “You would think Google would say, ‘Something is fucked up here,’” Knapp says. “These are IP addresses from overseas. It really undermines the review platform when things like this are allowed to happen.”

Google did not respond to multiple requests for comment; however, within a few hours of our call, Knapp said some problematic reviews on Google had cleared up for him. Smith said he had not yet gotten any response from Google about reviews, save for automated responses saying that multiple reviews he had flagged did not qualify getting taken down because “the reviews in question don’t fall under any of the violation categories, according to our policies.”

Spam reviews aren’t going anywhere and will continue to be a problem for years to come. And the fact remains that online communities — like the European anti-vaxxers that descended upon Mother’s Ruin’s reviews — can destroy faraway livelihoods with the click of a star rating.

Those ratings haunt business owners like Smith. “I still have folks putting one-star reviews on our Google listing,” he says. “Outliers pull down averages, that’s math. It’s a pretty effective means of attack for the folks who do this.”

Knapp feels equally frustrated and helpless. “We’re just trying to survive through the most traumatic experience that’s ever hit the hospitality industry,” he says. “The idea that we are under attack by this community and there is no real vehicle to combat it, that’s frustrating.”

These creepy fake humans herald a new age in AI

You can see the faint stubble coming in on his upper lip, the wrinkles on his forehead, the blemishes on his skin. He isn’t a real person, but he’s meant to mimic one—as are the hundreds of thousands of others made by Datagen, a company that sells fake, simulated humans.

These humans are not gaming avatars or animated characters for movies. They are synthetic data designed to feed the growing appetite of deep-learning algorithms. Firms like Datagen offer a compelling alternative to the expensive and time-consuming process of gathering real-world data. They will make it for you: how you want it, when you want—and relatively cheaply.

To generate its synthetic humans, Datagen first scans actual humans. It partners with vendors who pay people to step inside giant full-body scanners that capture every detail from their irises to their skin texture to the curvature of their fingers. The startup then takes the raw data and pumps it through a series of algorithms, which develop 3D representations of a person’s body, face, eyes, and hands.

The company, which is based in Israel, says it’s already working with four major US tech giants, though it won’t disclose which ones on the record. Its closest competitor, Synthesis AI, also offers on-demand digital humans. Other companies generate data to be used in finance, insurance, and health care. There are about as many synthetic-data companies as there are types of data.

Once viewed as less desirable than real data, synthetic data is now seen by some as a panacea. Real data is messy and riddled with bias. New data privacy regulations make it hard to collect. By contrast, synthetic data is pristine and can be used to build more diverse data sets. You can produce perfectly labeled faces, say, of different ages, shapes, and ethnicities to build a face-detection system that works across populations.

But synthetic data has its limitations. If it fails to reflect reality, it could end up producing even worse AI than messy, biased real-world data—or it could simply inherit the same problems. “What I don’t want to do is give the thumbs up to this paradigm and say, ‘Oh, this will solve so many problems,’” says Cathy O’Neil, a data scientist and founder of the algorithmic auditing firm ORCAA. “Because it will also ignore a lot of things.”

Realistic, not real

Deep learning has always been about data. But in the last few years, the AI community has learned that good data is more important than big data. Even small amounts of the right, cleanly labeled data can do more to improve an AI system’s performance than 10 times the amount of uncurated data, or even a more advanced algorithm.

That changes the way companies should approach developing their AI models, says Datagen’s CEO and cofounder, Ofir Chakon. Today, they start by acquiring as much data as possible and then tweak and tune their algorithms for better performance. Instead, they should be doing the opposite: use the same algorithm while improving on the composition of their data.

Datagen also generates fake furniture and indoor environments to put its fake humans in context.

But collecting real-world data to perform this kind of iterative experimentation is too costly and time intensive. This is where Datagen comes in. With a synthetic data generator, teams can create and test dozens of new data sets a day to identify which one maximizes a model’s performance.

To ensure the realism of its data, Datagen gives its vendors detailed instructions on how many individuals to scan in each age bracket, BMI range, and ethnicity, as well as a set list of actions for them to perform, like walking around a room or drinking a soda. The vendors send back both high-fidelity static images and motion-capture data of those actions. Datagen’s algorithms then expand this data into hundreds of thousands of combinations. The synthesized data is sometimes then checked again. Fake faces are plotted against real faces, for example, to see if they seem realistic.

Datagen is now generating facial expressions to monitor driver alertness in smart cars, body motions to track customers in cashier-free stores, and irises and hand motions to improve the eye- and hand-tracking capabilities of VR headsets. The company says its data has already been used to develop computer-vision systems serving tens of millions of users.

It’s not just synthetic humans that are being mass-manufactured. Click-Ins is a startup that uses synthetic AI to perform automated vehicle inspections. Using design software, it re-creates all car makes and models that its AI needs to recognize and then renders them with different colors, damages, and deformations under different lighting conditions, against different backgrounds. This lets the company update its AI when automakers put out new models, and helps it avoid data privacy violations in countries where license plates are considered private information and thus cannot be present in photos used to train AI.

Click-Ins renders cars of different makes and models against various backgrounds.
CLICK-INS works with financial, telecommunications, and insurance companies to provide spreadsheets of fake client data that let companies share their customer database with outside vendors in a legally compliant way. Anonymization can reduce a data set’s richness yet still fail to adequately protect people’s privacy. But synthetic data can be used to generate detailed fake data sets that share the same statistical properties as a company’s real data. It can also be used to simulate data that the company doesn’t yet have, including a more diverse client population or scenarios like fraudulent activity.

Proponents of synthetic data say that it can help evaluate AI as well. In a recent paper published at an AI conference, Suchi Saria, an associate professor of machine learning and health care at Johns Hopkins University, and her coauthors demonstrated how data-generation techniques could be used to extrapolate different patient populations from a single set of data. This could be useful if, for example, a company only had data from New York City’s more youthful population but wanted to understand how its AI performs on an aging population with higher prevalence of diabetes. She’s now starting her own company, Bayesian Health, which will use this technique to help test medical AI systems.

The limits of faking it

But is synthetic data overhyped?

When it comes to privacy, “just because the data is ‘synthetic’ and does not directly correspond to real user data does not mean that it does not encode sensitive information about real people,” says Aaron Roth, a professor of computer and information science at the University of Pennsylvania. Some data generation techniques have been shown to closely reproduce images or text found in the training data, for example, while others are vulnerable to attacks that make them fully regurgitate that data.

This might be fine for a firm like Datagen, whose synthetic data isn’t meant to conceal the identity of the individuals who consented to be scanned. But it would be bad news for companies that offer their solution as a way to protect sensitive financial or patient information.

Research suggests that the combination of two synthetic-data techniques in particular—differential privacy and generative adversarial networks—can produce the strongest privacy protections, says Bernease Herman, a data scientist at the University of Washington eScience Institute. But skeptics worry that this nuance can be lost in the marketing lingo of synthetic-data vendors, which won’t always be forthcoming about what techniques they are using.

Meanwhile, little evidence suggests that synthetic data can effectively mitigate the bias of AI systems. For one thing, extrapolating new data from an existing data set that is skewed doesn’t necessarily produce data that’s more representative. Datagen’s raw data, for example, contains proportionally fewer ethnic minorities, which means it uses fewer real data points to generate fake humans from those groups. While the generation process isn’t entirely guesswork, those fake humans might still be more likely to diverge from reality. “If your darker-skin-tone faces aren’t particularly good approximations of faces, then you’re not actually solving the problem,” says O’Neil.

For another, perfectly balanced data sets don’t automatically translate into perfectly fair AI systems, says Christo Wilson, an associate professor of computer science at Northeastern University. If a credit card lender were trying to develop an AI algorithm for scoring potential borrowers, it would not eliminate all possible discrimination by simply representing white people as well as Black people in its data. Discrimination could still creep in through differences between white and Black applicants.

To complicate matters further, early research shows that in some cases, it may not even be possible to achieve both private and fair AI with synthetic data. In a recent paper published at an AI conference, researchers from the University of Toronto and the Vector Institute tried to do so with chest x-rays. They found they were unable to create an accurate medical AI system when they tried to make a diverse synthetic data set through the combination of differential privacy and generative adversarial networks.

None of this means that synthetic data shouldn’t be used. In fact, it may well become a necessity. As regulators confront the need to test AI systems for legal compliance, it could be the only approach that gives them the flexibility they need to generate on-demand, targeted testing data, O’Neil says. But that makes questions about its limitations even more important to study and answer now.

“Synthetic data is likely to get better over time,” she says, “but not by accident.”

What makes the Delta covid-19 variant more infectious?

Covid cases are on the rise in England, and a fast-spreading variant may be to blame. B.1.617.2, which now goes by the name Delta, first emerged in India, but has since spread to 62 countries, according to the World Health Organization.

Delta is still rare in the US. At a press conference on Tuesday, the White House’s chief medical advisor, Anthony Fauci, said that it accounts for just 6% of cases. But in the UK it has quickly overtaken B.1.1.7—also known as Alpha—to become the dominant strain, which could derail the country’s plans to ease restrictions on June 21.

The total number of cases is still small, but public health officials are watching the variant closely. On Monday, UK Secretary of State for Health and Social Care Matt Hancock reported that Delta appears to be about 40% more transmissible than Alpha, but scientists are still trying to pin down the exact number—estimates range from 30% to 100%. They are also working to understand what makes it more infectious. They don’t yet have many answers, but they do have hypotheses.

All viruses acquire mutations in their genetic code as they replicate, and SARS-CoV-2 is no exception. Many of these mutations have no impact at all. But some change the virus’s structure or function. Identifying changes in the genetic sequence of a virus is simple. Figuring out how those changes impact the way a virus spreads is trickier. The spike protein, which helps the virus gain entry to cells, is a good place to start. 

How Delta enters cells

To infect cells, SARS-CoV-2 must enter the body and bind to receptors on the surface of cells. The virus is studded with mushroom-shaped spike proteins that latch onto a receptor called ACE2 on human cells. This receptor is found on many cell types, including those that line the lungs. Think of it like a key fitting into a lock.

Mutations that help the virus bind more tightly can make transmission from one person to another easier. Imagine you breathe in a droplet that contains SARS-CoV-2. If that droplet contains viruses with better binding capabilities, they “will be more efficient at finding and infecting one of your cells,” says Nathaniel Landau, a microbiologist at NYU Grossman School of Medicine.

Scientists don’t yet know how many particles of SARS-CoV-2 you have to inhale to become infected, but the threshold would likely be lower for a virus that is better at grabbing onto ACE2. 

Landau and his colleagues study binding in the lab by creating pseudoviruses. These lab-engineered viruses can’t replicate, but researchers can tweak them to express the spike protein on their surface. That allows them to easily test binding without needing to use a high-security laboratory. The researchers mix these pseudoviruses with plastic beads covered with ACE2 and then work out how much virus sticks to the beads. The greater the quantity of virus, the better the virus is at binding. In a preprint posted in May, Landau and colleagues show that some of the mutations present in Delta do enhance binding. 

How it infects once inside

But better binding not only lowers the threshold for infection. Because the virus is better at grabbing ACE2, it also will infect more cells inside the body. “The infected person will have more virus in them, because the virus is replicating more efficiently,” Landau says. 

After the virus binds to ACE2, the next step is to fuse with the cell, a process that begins when enzymes from the host cell cut the spike at two different sites, a process known as cleavage. This kick starts the fusion machinery. If binding is like the key fitting in the lock, cleavage is like the key turning the deadbolt. “Without cuts at both sites, the virus can’t get into cells,” says Vineet Menachery, a virologist at The University of Texas Medical Branch. 

One of the mutations present in Delta actually occurs in one of these cleavage sites, and a new study that has not yet been peer reviewed shows that this mutation does enhance cleavage. And Menachery, who was not involved in the study, says he has replicated those results in his lab. “So it’s a little bit easier for the virus to be activated,” he says.

Whether that improves transmissibility isn’t yet known, but it could. When scientists delete these cleavage sites, the virus becomes less transmissible and less pathogenic, Menachery says. So it stands to reason that changes that facilitate cleavage would increase transmissibility. 

It’s also possible that Delta’s ability to evade the body’s immune response helps fuel transmission. Immune evasion means more cells become infected and produce more virus, which then potentially makes it easier for person carrying that virus to infect someone else. 

But vaccines still work

The good news is that vaccination provides strong protection against Delta. A new study from Public Health England shows that the Pfizer-BioNTech vaccine was 88% effective in preventing symptomatic disease due to Delta in fully vaccinated people. The AstraZeneca vaccine provided slightly less protection. Two shots were 60% effective against the variant. The effectiveness of one dose of either vaccine, however, was much lower— just 33%.

In any case, in the US and UK, just around 42% of the population is fully vaccinated. In India, where the virus surged fueled in part by the rapid spread of Delta, just 3.3% of the population has achieved full vaccination. 

At the press briefing, Fauci urged those who have not been vaccinated to get their first shot and reminded those who are partially vaccinated not to skip their second dose. The Biden Administration hopes to have 70% of the population at least partially vaccinated by the Fourth of July. In the UK, Delta quickly replaced Alpha to become the dominant strain, and cases are now on the rise. “We cannot let that happen in the United States,” Fauci said. 

The coming productivity boom

The last 15 years have been tough times for many Americans, but there are now encouraging signs of a turnaround.

Productivity growth, a key driver for higher living standards, averaged only 1.3% since 2006, less than half the rate of the previous decade. But on June 3, the US Bureau of Labor Statistics reported that US labor productivity increased by 5.4% in the first quarter of 2021. What’s better, there’s reason to believe that this is not just a blip, but rather a harbinger of better times ahead: a productivity surge that will match or surpass the boom times of the 1990s.  

Annual Labor Productivity Growth, 2001 – 2021 Q1

Annual Labor Productivity Growth 2001 - 2021 Q1
For much of the past decade, productivity growth has been sluggish, but now there are signs it’s picking up. (Source: US Bureau of Labor Statistics)

Our optimism is grounded in our research which indicates that most OECD countries are just passing the lowest point in a productivity J-curve. Driven by advances in digital technologies, such as artificial intelligence, productivity growth is now headed up.

Technology alone is rarely enough to create significant benefits.

The productivity J-curve describes the historical pattern of initially slow productivity growth after a breakthrough technology is introduced, followed years later by a sharp takeoff. Our research and that of others has found that technology alone is rarely enough to create significant benefits. Instead, technology investments must be combined with even larger investments in new business processes, skills, and other types of intangible capital before breakthroughs as diverse as the steam engine or computers ultimately boost productivity. For instance, after electricity was introduced to American factories, productivity was stagnant for more than two decades. It was only after managers reinvented their production lines using distributed machinery, a technique made possible by electricity, that productivity surged.

There are three reasons that this time around the productivity J-curve will be bigger and faster than in the past.

The first is technological: the past decade has delivered an astonishing cluster of technology breakthroughs. The most important ones are in AI: the development of machine learning algorithms combined with large decline in prices for data storage and improvements in computing power has allowed firms to address challenges from vision and speech to prediction and diagnosis. The fast-growing cloud computing market has made these innovations accessible to smaller firms.   

Significant innovations have also happened in biomedical sciences and energy. In drug discovery and development, new technologies have allowed researchers to optimize the design of new drugs and predict the 3D structures of proteins. At the same time, breakthrough vaccine technology using messenger RNA has introduced a revolutionary approach that could lead to effective treatments for many other diseases. Moreover, major innovations have led to the steep decline in the price of solar energy and the sharp increase in its energy conversion efficiency rate with serious implications for the future of the energy sector as well as for the environment.

The costs of covid-19 have been tragic, but the pandemic has also compressed a decade’s worth of digital innovation in areas like remote work into less than a year. What’s more, evidence suggests that even after the pandemic, a significant fraction of work will be done remotely, while a new class of high-skill service workers, the digital nomads, is emerging.   

As a result, the biggest productivity impact of the pandemic will be realized in the longer-run. Even technology skeptics like Robert Gordon are more optimistic this time. The digitization and reorganization of work has brought us to a turning point in the productivity J-curve.

The third reason to be optimistic about productivity has to do with the aggressive fiscal and monetary policy being implemented in the US. The recent covid-19 relief package is likely to reduce the unemployment rate from 5.8% (in May 2021) to the historically low pre-covid levels in the neighborhood of 4%. Running the economy hot with full employment can accelerate the arrival of the productivity boom. Low unemployment levels drive higher wages which means firms have more incentive to harvest the potential benefits of technology to further improve productivity.

When you put these three factors together—the bounty of technological advances, the compressed restructuring timetable due to covid-19, and an economy finally running at full capacity—the ingredients are in place for a productivity boom. This will not only boost living standards directly, but also frees up resources for a more ambitious policy agenda.

Erik Brynjolfsson is a professor at Stanford and director of the Stanford Digital Economy Lab. Georgios Petropoulos is a post-doc at MIT, a research fellow at Bruegel, and a digital fellow at the Stanford Digital Economy Lab.

TikTok changed the shape of some people’s faces without asking

“That’s not my face,” Tori Dawn thought after opening TikTok to make a video in late May. The jaw reflected back on the screen was wrong: slimmer and more feminine. And when they waved their hand in front of the camera, blocking most of their face from the lens, their jaw appeared to pop back to normal. Was their skin also a little softer? 

On further investigation, it seemed as if the image was being run through a beauty filter in the TikTok app. Normally, Dawn keeps those filters off in livestreams and videos to around 320,000 followers. But as they flipped around the app’s settings, there was no way to disable the effect:  it seemed to be permanently in place, subtly feminizing Dawn’s features.

“My face is pretty androgynous and I like my jawline,” Dawn said in an interview. “So when I saw that it was popping in and out, I’m like ‘Why would they do that, why?’ This is one of the only things that I like about my face. Why would you do that?” 

Beauty filters are now a part of life online, allowing users to opt in to changing the face they present to the world on social media. Filters can widen eyes, plump up lips, apply makeup, and change the shape of the face, among other things. But it’s usually a choice, not forced on users—which is why Dawn and others who encountered this strange effect were so angry and disturbed by it. 

Dawn told followers about it in a video, showing the effect pop in and out on screen: “I don’t feel comfortable making videos because this is not what I look like, and I don’t know how to fix it.” The video got more than 300,000 views, they said, and was shared and duetted by other users who noticed the same thing. 


congrats tiktok I am super uncomfortable and disphoric now cuz of whatever the fuck this shit is

♬ original sound – Tori Dawn

“Is that why I’ve been kind of looking like an alien lately?” said one. 

“Tiktok. Fix this,” said another

Videos like these circulated for days in late May, as a portion of TikTok’s users looked into the camera and saw a face that wasn’t their own. As the videos spread, many users wondered whether the company was secretly testing out a beauty filter on some users. 

An odd, temporary issue

I’m a TikTok lurker, not a maker, so it was only after seeing Dawn’s video that I decided to see if the effect appeared on my own camera. Once I started making a video, the change to my jaw shape was obvious. I suspected, but couldn’t tell for sure, that my skin had been smoothed as well. I sent a video of it in action to coworkers and my Twitter followers, asking them to open the app and try the same thing on their own phones: from their responses, I learned that the effect only seemed to affect Android phones. I reached out to TikTok, and the effect stopped appearing two days later. The company later acknowledged in a short statement that there was an issue that had been resolved, but did not provide further details.

On the surface it was an odd, temporary issue that affected some users and not others. But it was also forcibly changing people’s appearance—an important glitch for an app that is used by around 100 million people in the US. So I also sent the video to Amy Niu, a PhD candidate at the University of Wisconsin who studies the psychological impact of beauty filters. She pointed out that in China, and some other places, some apps add a subtle beauty filter by default. When Niu uses apps like WeChat, she can only really tell that a filter is in place by comparing a photo of herself using her camera with the image produced in the app. 

A couple of months ago, she said, she downloaded the Chinese version of TikTok, called Douyin. “When I turned off the beauty mode and filters, I can still see an adjustment to my face,” she said. 

Having beauty filters in an app isn’t necessarily a bad thing, Niu said, but app designers have a responsibility to consider how those filters will be used, and how they will change the people who use them. Even if it was a temporary bug, it could have an impact on how people see themselves.

“People’s internalization of beauty standards, their own body image, or whether they will intensify their appearance concern,” are all considerations, Niu said. 

For Dawn, the strange facial effect was just one more thing to add to the list of frustrations with TikTok: “It’s been very reminiscent of a relationship with a narcissist, because they love-bomb you one minute, they’re giving you all these followers and all this attention and it feels so good,” they said. “And then for some reason they just—they’re just like, we’re cutting you off.”