A US push to use ethanol as aviation fuel raises major climate concerns

Eliminating carbon pollution from aviation is one of the most challenging parts of the climate puzzle, simply because large commercial airlines are too heavy and need too much power during takeoff for today’s batteries to do the job. 

But one way that companies and governments are striving to make some progress is through the use of various types of sustainable aviation fuels (SAFs), which are derived from non-petroleum sources and promise to be less polluting than standard jet fuel.

This week, the US announced a push to help its biggest commercial crop, corn, become a major feedstock for SAFs. 

Federal guidelines announced on April 30 provide a pathway for ethanol producers to earn SAF tax credits within the Inflation Reduction Act, President Biden’s signature climate law, when the fuel is produced from corn or soy grown on farms that adopt certain sustainable agricultural practices.

It’s a limited pilot program, since the subsidy itself expires at the end of this year. But it could set the template for programs in the future that may help ethanol producers generate more and more SAFs, as the nation strives to produce billions of gallons of those fuels per year by 2030. 

Consequently, the so-called Climate Smart Agricultural program has already sounded alarm bells among some observers, who fear that the federal government is both overestimating the emissions benefits of ethanol and assigning too much credit to the agricultural practices in question. Those include cover crops, no-till techniques that minimize soil disturbances, and use of “enhanced-efficiency fertilizers,” which are designed to increase uptake by plants and thus reduce runoff into the environment.

The IRA offers a tax credit of $1.25 per gallon for SAFs that are 50% lower in emissions than standard jet fuel, and as much as 50 cents per gallon more for sustainable fuels that are cleaner still. The new program can help corn- or soy-based ethanol meet that threshold when the source crops are produced using some or all of those agricultural practices.

Since the vast majority of US ethanol is produced from corn, let’s focus on the issues around that crop. To get technical, the program allows ethanol producers to subtract 10 grams of carbon dioxide per megajoule of energy, a measure of carbon intensity, from the life-cycle emissions of the fuel when it’s generated from corn produced with all three of the practices mentioned. That’s about an eighth to a tenth of the carbon intensity of gasoline.

Ethanol’s questionable climate footprint

Today, US-generated ethanol is mainly mixed with gasoline. But ethanol producers are eager to develop new markets for the product as electric vehicles make up a larger share of the cars and trucks on the road. Not surprisingly, then, industry trade groups applauded the announcement this week.

The first concern with the new program, however, is that the emissions benefits of corn-based ethanol have been hotly debated for decades.

Corn, like any plant that uses photosynthesis to produce food, sucks up carbon dioxide from the air. But using corn for fuel rather than food also creates pressure to clear more land for farming, a process that releases carbon dioxide from plants and soil. In addition, planting, fertilizing, and harvesting corn produce climate pollution as well, and the same is true of refining, distributing, and burning ethanol. 

For its analyses under the new program, the Treasury Department intends to use an updated version of the so-called GREET model to evaluate the life-cycle emissions of SAFs, which was developed by the Department of Energy’s Argonne National Lab. A 2021 study from the lab, relying on that model, concluded that US corn ethanol produced as much as 52% less greenhouse gas than gasoline. 

But some researchers and nonprofits have criticized the tool for accepting low estimates of the emissions impacts of land-use changes, among other issues. Other assessments of ethanol emissions have been far more damning.

A 2022 EPA analysis surveyed the findings from a variety of models that estimate the life-cycle emissions of corn-based ethanol and found that in seven out of 20 cases, they exceeded 80% of the climate pollution from gasoline and diesel.

Moreover, the three most recent estimates from those models found ethanol emissions surpassed even the higher-end estimates for gasoline or diesel, Alison Cullen, chair of the EPA’s science advisory board, noted in a 2023 letter to the administrator of the agency.

“Thus, corn starch ethanol may not meet the definition of a renewable fuel” under the federal law that mandates the use of biofuels in the market, she wrote. If so, it’s then well short of the 50% threshold required by the IRA, and some say it’s not clear that the farming practices laid out this week could close the gap.

Agricultural practices

Nikita Pavlenko, who leads the fuels team at the International Council on Clean Transportation, a nonprofit research group, asserted in an email that the climate-smart agricultural provisions “are extremely sloppy” and “are not substantiated.” 

He said the Department of Energy and Department of Agriculture especially “put their thumbs on the scale” on the question of land-use changes, using estimates of soy and corn emissions that were 33% to 55% lower than those produced for a program associated with the UN’s International Civil Aviation Organization.

He finds that ethanol sourced from farms using these agriculture practices will still come up short of the IRA’s 50% threshold, and that producers may have to take additional steps to curtail emissions, potentially including adding carbon capture and storage to ethanol facilities or running operations on renewables like wind or solar.

Freya Chay, a program lead at CarbonPlan, which evaluates the scientific integrity of carbon removal methods and other climate actions, says that these sorts of agricultural practices can provide important benefits, including improving soil health, reducing erosion, and lowering the cost of farming. But she and others have stressed that confidently determining when certain practices actually and durably increase carbon in soil is “exceedingly complex” and varies widely depending on soil type, local climate conditions, past practices, and other variables.

One recent study of no-till practices found that the carbon benefits quickly fade away over time and reach nearly zero in 14 years. If so, this technique would do little to help counter carbon emissions from fuel combustion, which can persist in the atmosphere for centuries or more.

“US policy has a long history of asking how to continue justifying investment in ethanol rather than taking a clear-eyed approach to evaluating whether or not ethanol helps us reach our climate goals,” Chay wrote in an email. “In this case, I think scrutiny is warranted around the choice to lean on agricultural practices with uncertain and variable benefits in a way that could unlock the next tranche of public funding for corn ethanol.”

There are many other paths for producing SAFs that are or could be less polluting than ethanol. For example, they can be made from animal fats, agriculture waste, forest trimmings, or non-food plants that grow on land unsuitable for commercial crops. Other companies are developing various types of synthetic fuels, including electrofuels produced by capturing carbon from plants or the air and then combining it with cleanly sourced hydrogen. 

But all these methods are much more expensive than extracting and refining fossil fuels, and most of the alternative fuels will still produce more emissions when they’re used than the amount that was pulled out of the atmosphere by the plants or processes in the first place. 

The best way to think of these fuels is arguably as a stopgap, a possible way to make some climate progress while smart people strive to develop and build fully emissions-free ways of quickly, safely, and reliably moving things and people around the globe.

Sam Altman says helpful agents are poised to become AI’s killer function

A number of moments from my brief sit-down with Sam Altman brought the OpenAI CEO’s worldview into clearer focus. The first was when he pointed to my iPhone SE (the one with the home button that’s mostly hated) and said, “That’s the best iPhone.” More revealing, though, was the vision he sketched for how AI tools will become even more enmeshed in our daily lives than the smartphone.

“What you really want,” he told MIT Technology Review, “is just this thing that is off helping you.” Altman, who was visiting Cambridge for a series of events hosted by Harvard and the venture capital firm Xfund, described the killer app for AI as a “super-competent colleague that knows absolutely everything about my whole life, every email, every conversation I’ve ever had, but doesn’t feel like an extension.” It could tackle some tasks instantly, he said, and for more complex ones it could go off and make an attempt, but come back with questions for you if it needs to. 

It’s a leap from OpenAI’s current offerings. Its leading applications, like DALL-E, Sora, and ChatGPT (which Altman referred to as “incredibly dumb” compared with what’s coming next), have wowed us with their ability to generate convincing text and surreal videos and images. But they mostly remain tools we use for isolated tasks, and they have limited capacity to learn about us from our conversations with them. 

In the new paradigm, as Altman sees it, the AI will be capable of helping us outside the chat interface and taking real-world tasks off our plates. 

Altman on AI hardware’s future 

I asked Altman if we’ll need a new piece of hardware to get to this future. Though smartphones are extraordinarily capable, and their designers are already incorporating more AI-driven features, some entrepreneurs are betting that the AI of the future will require a device that’s more purpose built. Some of these devices are already beginning to appear in his orbit. There is the (widely panned) wearable AI Pin from Humane, for example (Altman is an investor in the company but has not exactly been a booster of the device). He is also rumored to be working with former Apple designer Jony Ive on some new type of hardware. 

But Altman says there’s a chance we won’t necessarily need a device at all. “I don’t think it will require a new piece of hardware,” he told me, adding that the type of app envisioned could exist in the cloud. But he quickly added that even if this AI paradigm shift won’t require consumers buy a new hardware, “I think you’ll be happy to have [a new device].” 

Though Altman says he thinks AI hardware devices are exciting, he also implied he might not be best suited to take on the challenge himself: “I’m very interested in consumer hardware for new technology. I’m an amateur who loves it, but this is so far from my expertise.”

On the hunt for training data

Upon hearing his vision for powerful AI-driven agents, I wondered how it would square with the industry’s current scarcity of training data. To build GPT-4 and other models, OpenAI has scoured internet archives, newspapers, and blogs for training data, since scaling laws have long shown that making models bigger also makes them better. But finding more data to train on is a growing problem. Much of the internet has already been slurped up, and access to private or copyrighted data is now mired in legal battles. 

Altman is optimistic this won’t be a problem for much longer, though he didn’t articulate the specifics. 

“I believe, but I’m not certain, that we’re going to figure out a way out of this thing of you always just need more and more training data,” he says. “Humans are existence proof that there is some other way to [train intelligence]. And I hope we find it.”

On who will be poised to create AGI

OpenAI’s central vision has long revolved around the pursuit of artificial general intelligence (AGI), or an AI that can reason as well as or better than humans. Its stated mission is to ensure such a technology “benefits all of humanity.” It is far from the only company pursuing AGI, however. So in the race for AGI, what are the most important tools? I asked Altman if he thought the entity that marshals the largest amount of chips and computing power will ultimately be the winner. 

Altman suspects there will be “several different versions [of AGI] that are better and worse at different things,” he says. “You’ll have to be over some compute threshold, I would guess. But even then I wouldn’t say I’m certain.”

On when we’ll see GPT-5

You thought he’d answer that? When another reporter in the room asked Altman if he knew when the next version of GPT is slated to be released, he gave a calm response. “Yes,” he replied, smiling, and said nothing more. 

The Download: mysterious radio energy from outer space, and banning TikTok

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Inside the quest to map the universe with mysterious bursts of radio energy

When our universe was less than half as old as it is today, a burst of energy that could cook a sun’s worth of popcorn shot out from somewhere amid a compact group of galaxies. Some 8 billion years later, radio waves from that burst reached Earth and were captured by a sophisticated low-frequency radio telescope in the Australian outback. 

The signal, which arrived in June 2022, and lasted for under half a millisecond, is one of a growing class of mysterious radio signals called fast radio bursts. In the last 10 years, astronomers have picked up nearly 5,000 of them. This one was particularly special: nearly double the age of anything previously observed, and three and a half times more energetic. 

No one knows what causes fast radio bursts. They flash in a seemingly random and unpredictable pattern from all over the sky. But despite the mystery, these radio waves are starting to prove extraordinarily useful. Read the full story.

—Anna Kramer

The depressing truth about TikTok’s impending ban

Trump’s 2020 executive order banning TikTok came to nothing in the end. Yet the idea—that the US government should ban TikTok in some way—never went away. It would repeatedly be suggested in different forms and shapes. And eventually, on April 24, 2024, things came full circle with the bill passed in Congress and signed into law.

A lot has changed in those four years. Back then, TikTok was a rising sensation that many people didn’t understand; now, it’s one of the biggest social media platforms. But if the TikTok saga tells us anything, it’s that the US is increasingly inhospitable for Chinese companies. Read the full story.

—Zeyi Yang

This story is from China Report, our weekly newsletter covering tech and policy in China. Sign up to receive it in your inbox every Tuesday.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Changpeng Zhao has been sentenced to just four months in prison
The crypto exchange founder got off pretty lightly after pleading guilty to a money-laundering violation. (The Verge)+ The US Department of Justice had sought a three-year sentence. (The Guardian)

2 Tesla has gutted its charging team
Which is extremely bad news for those reliant on its massive charging network. (NYT $)
+ And more layoffs may be coming down the road. (The Information $)
+ Why getting more EVs on the road is all about charging. (MIT Technology Review)

3 A group of newspapers joined forces to sue OpenAI 
It comes just after the AI firm signed a deal with the Financial Times to use its articles as training data for its models. (WP $)
+ Meanwhile, Google is working with News Corp to fund new AI content. (The Information $)
+ OpenAI’s hunger for data is coming back to bite it. (MIT Technology Review)

4 Worldcoin is thriving in Argentina
The cash it offers in exchange for locals’ biometric data is a major incentive as unemployment in the country bites. (Rest of World)
+ Deception, exploited workers, and cash handouts: How Worldcoin recruited its first half a million test users. (MIT Technology Review)

5 Bill Gates’ shadow looms large over Microsoft
The company’s AI revolution is no accident. (Insider $)

6 It’s incredibly difficult to turn off a car’s location tracking
Domestic abuse activists worry the technology plays into abusers’ hands. (The Markup)
+ Regulators are paying attention. (NYT $)

7 Brain monitors have a major privacy problem
Many of them sell your neural data without asking additional permission. (New Scientist $)
+ How your brain data could be used against you. (MIT Technology Review)

8 ECMO machines are a double-edged sword
They help keep critically ill patients alive. But at what cost? (New Yorker $)

9 How drones are helping protect wildlife from predators
So long as wolves stop trying to play with the drones, that is. (Undark Magazine)

10 This plastic contains bacteria that’ll break it down
It has the unusual side-effect of making the plastic even stronger, too. (Ars Technica)
+ Think that your plastic is being recycled? Think again. (MIT Technology Review)

Quote of the day

“I have constantly been looking ahead for the next thing that’s going to crush all my dreams and the stuff that I built.”

—Tony Northrup, a stock image photographer, explains to the Wall Street Journal generative AI is finally killing an industry that weathered the advent of digital cameras and the internet.

The big story

A new tick-borne disease is killing cattle in the US

November 2021

In the spring of 2021, Cynthia and John Grano, who own a cattle operation in Culpeper County, Virginia, started noticing some of their cows slowing down and acting “spacey.” They figured the animals were suffering from a common infectious disease that causes anemia in cattle. But their veterinarian had warned them that another disease carried by a parasite was spreading rapidly in the area.

After a third cow died, the Granos decided to test its blood. Sure enough, the test came back positive for the disease: theileria. And with no treatment available, the cows kept dying.

Livestock producers around the US are confronting this new and unfamiliar disease without much information, and researchers still don’t know how theileria will unfold, even as it quickly spreads west across the country. Read the full story.

—Britta Lokting

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)

+ This Instagram account documenting the weird and wonderful world of Beanie Babies is the perfect midweek pick-me-up.
+ Challengers is great—but have you seen the rest of the best sports films?
+ This human fruit machine is killing me.
+ Evan Narcisse is a giant in the video games world.

The depressing truth about TikTok’s impending ban

This story first appeared in China Report, MIT Technology Review’s newsletter about technology in China. Sign up to receive it in your inbox every Tuesday.

Allow me to indulge in a little reflection this week. Last week, the divest-or-ban TikTok bill was passed in Congress and signed into law. Four years ago, when I was just starting to report on the world of Chinese technologies, one of my first stories was about very similar news: President Donald Trump announcing he’d ban TikTok. 

That 2020 executive order came to nothing in the end—it was blocked in the courts, put aside after the presidency changed hands, and eventually withdrawn by the Biden administration. Yet the idea—that the US government should ban TikTok in some way—never went away. It would repeatedly be suggested in different forms and shapes. And eventually, on April 24, 2024, things came full circle.

A lot has changed in the four years between these two news cycles. Back then, TikTok was a rising sensation that many people didn’t understand; now, it’s one of the biggest social media platforms, the originator of a generation-defining content medium, and a music-industry juggernaut. 

What has also changed is my outlook on the issue. For a long time, I thought TikTok would find a way out of the political tensions, but I’m increasingly pessimistic about its future. And I have even less hope for other Chinese tech companies trying to go global. If the TikTok saga tells us anything, it’s that their Chinese roots will be scrutinized forever, no matter what they do.

I don’t believe TikTok has become a larger security threat now than it was in 2020. There have always been issues with the app, like potential operational influence by the Chinese government, the black-box algorithms that produce unpredictable results, and the fact that parent company ByteDance never managed to separate the US side and the China side cleanly, despite efforts (one called Project Texas) to store and process American data locally. 

But none of those problems got worse over the last four years. And interestingly, while discussions in 2020 still revolved around potential remedies like setting up data centers in the US to store American data or having an organization like Oracle audit operations, those kinds of fixes are not in the law passed this year. As long as it still has Chinese owners, the app is not permissible in the US. The only thing it can do to survive here is transfer ownership to a US entity. 

That’s the cold, hard truth not only for TikTok but for other Chinese companies too. In today’s political climate, any association with China and the Chinese government is seen as unacceptable. It’s a far cry from the 2010s, when Chinese companies could dream about developing a killer app and finding audiences and investors around the globe—something many did pull off. 

There’s something I wrote four years ago that still rings true today: TikTok is the bellwether for Chinese companies trying to go global. 

The majority of Chinese tech giants, like Alibaba, Tencent, and Baidu, operate primarily within China’s borders. TikTok was the first to gain mass popularity in lots of other countries across the world and become part of daily life for people outside China. To many Chinese startups, it showed that the hard work of trying to learn about foreign countries and users can eventually pay off, and it’s worth the time and investment to try.

On the other hand, if even TikTok can’t get itself out of trouble, with all the resources that ByteDance has, is there any hope for the smaller players?

When TikTok found itself in trouble, the initial reaction of these other Chinese companies was to conceal their roots, hoping they could avoid attention. During my reporting, I’ve encountered multiple companies that fret about being described as Chinese. “We are headquartered in Boston,” one would say, while everyone in China openly talked about its product as the overseas version of a Chinese app.

But with all the political back-and-forth about TikTok, I think these companies are also realizing that concealing their Chinese associations doesn’t work—and it may make them look even worse if it leaves users and regulators feeling deceived.

With the new divest-or-ban bill, I think these companies are getting a clear signal that it’s not the technical details that matter—only their national origin. The same worry is spreading to many other industries, as I wrote in this newsletter last week. Even in the climate and renewable power industries, the presence of Chinese companies is becoming increasingly politicized. They, too, are finding themselves scrutinized more for their Chinese roots than for the actual products they offer.

Obviously, none of this is good news to me. When they feel unwelcome in the US market, Chinese companies don’t feel the need to talk to international media anymore. Without these vital conversations, it’s even harder for people in other countries to figure out what’s going on with tech in China.

Instead of banning TikTok because it’s Chinese, maybe we should go back to focus on what TikTok did wrong: why certain sensitive political topics seem deprioritized on the platform; why Project Texas has stalled; how to make the algorithmic workings of the platform more transparent. These issues, instead of whether TikTok is still controlled by China, are the things that actually matter. It’s a harder path to take than just banning the app entirely, but I think it’s the right one.

Do you believe the TikTok ban will go through? Let me know your thoughts at zeyi@technologyreview.com.


Now read the rest of China Report

Catch up with China

1. Facing the possibility of a total ban on TikTok, influencers and creators are making contingency plans. (Wired $)

2. TSMC has brought hundreds of Taiwanese employees to Arizona to build its new chip factory. But the company is struggling to bridge cultural and professional differences between American and Taiwanese workers. (Rest of World)

3. The US secretary of state, Antony Blinken, met with Chinese president Xi Jinping during a visit to China this week. (New York Times $)

  • Here’s the best way to describe these recent US-China diplomatic meetings: “The US and China talk past each other on most issues, but at least they’re still talking.” (Associated Press)

4. Half of Russian companies’ payments to China are made through middlemen in Hong Kong, Central Asia, or the Middle East to evade sanctions. (Reuters $)

5. A massive auto show is taking place in Beijing this week, with domestic electric vehicles unsurprisingly taking center stage. (Associated Press)

  • Meanwhile, Elon Musk squeezed in a quick trip to China and met with his “old friend” the Chinese premier Li Qiang, who was believed to have facilitated establishing the Gigafactory in Shanghai. (BBC)
  • Tesla may finally get a license to deploy its autopilot system, which it calls Full Self Driving, in China after agreeing to collaborate with Baidu. (Reuters $)

6. Beijing has hosted two rival Palestinian political groups, Hamas and Fatah, to talk about potential reconciliation. (Al Jazeera)

Lost in translation

The Chinese dubbing community is grappling with the impacts of new audio-generating AI tools. According to the Chinese publication ACGx, for a new audio drama, a music company licensed the voice of the famous dubbing actor Zhao Qianjing and used AI to transform it into multiple characters and voice the entire script. 

But online, this wasn’t really celebrated as an advancement for the industry. Beyond criticizing the quality of the audio drama (saying it still doesn’t sound like real humans), dubbers are worried about the replacement of human actors and increasingly limited opportunities for newcomers. Other than this new audio drama, there have been several examples in China where AI audio generation has been used to replace human dubbers in documentaries and games. E-book platforms have also allowed users to choose different audio-generated voices to read out the text. 

One more thing

While in Beijing, Antony Blinken visited a record store and bought two vinyl records—one by Taylor Swift and another by the Chinese rock star Dou Wei. Many Chinese (and American!) people learned for the first time that Blinken had previously been in a rock band.

Inside the quest to map the universe with mysterious bursts of radio energy

When our universe was less than half as old as it is today, a burst of energy that could cook a sun’s worth of popcorn shot out from somewhere amid a compact group of galaxies. Some 8 billion years later, radio waves from that burst reached Earth and were captured by a sophisticated low-frequency radio telescope in the Australian outback. 

The signal, which arrived on June 10, 2022, and lasted for under half a millisecond, is one of a growing class of mysterious radio signals called fast radio bursts. In the last 10 years, astronomers have picked up nearly 5,000 of them. This one was particularly special: nearly double the age of anything previously observed, and three and a half times more energetic. 

But like the others that came before, it was otherwise a mystery. No one knows what causes fast radio bursts. They flash in a seemingly random and unpredictable pattern from all over the sky. Some appear from within our galaxy, others from previously unexamined depths of the universe. Some repeat in cyclical patterns for days at a time and then vanish; others have been consistently repeating every few days since we first identified them. Most never repeat at all. 

Despite the mystery, these radio waves are starting to prove extraordinarily useful. By the time our telescopes detect them, they have passed through clouds of hot, rippling plasma, through gas so diffuse that particles barely touch each other, and through our own Milky Way. And every time they hit the free electrons floating in all that stuff, the waves shift a little bit. The ones that reach our telescopes carry with them a smeary fingerprint of all the ordinary matter they’ve encountered between wherever they came from and where we are now. 

This makes fast radio bursts, or FRBs, invaluable tools for scientific discovery—especially for astronomers interested in the very diffuse gas and dust floating between galaxies, which we know very little about. 

“We don’t know what they are, and we don’t know what causes them. But it doesn’t matter. This is the tool we would have constructed and developed if we had the chance to be playing God and create the universe,” says Stuart Ryder, an astronomer at Macquarie University in Sydney and the lead author of the Science paper that reported the record-breaking burst. 

Many astronomers now feel confident that finding more such distant FRBs will enable them to create the most detailed three-dimensional cosmological map ever made—what Ryder likens to a CT scan of the universe. Even just five years ago making such a map might have seemed an intractable technical challenge: spotting an FFB and then recording enough data to determine where it came from is extraordinarily difficult because most of that work must happen in the few milliseconds before the burst passes.

But that challenge is about to be obliterated. By the end of this decade, a new generation of radio telescopes and related technologies coming online in Australia, Canada, Chile, California, and elsewhere should transform the effort to find FRBs—and help unpack what they can tell us. What was once a series of serendipitous discoveries will become something that’s almost routine. Not only will astronomers be able to build out that new map of the universe, but they’ll have the chance to vastly improve our understanding of how galaxies are born and how they change over time. 

Where’s the matter?

In 1998, astronomers counted up the weight of all of the identified matter in the universe and got a puzzling result. 

We know that about 5% of the total weight of the universe is made up of baryons like protons and neutrons— the particles that make up atoms, or all the “stuff” in the universe. (The other 95% includes dark energy and dark matter.) But the astronomers managed to locate only about 2.5%, not 5%, of the universe’s total. “They counted the stars, black holes, white dwarfs, exotic objects, the atomic gas, the molecular gas in galaxies, the hot plasma, etc. They added it all up and wound up at least a factor of two short of what it should have been,” says Xavier Prochaska, an astrophysicist at the University of California, Santa Cruz, and an expert in analyzing the light in the early universe. “It’s embarrassing. We’re not actively observing half of the matter in the universe.” 

All those missing baryons were a serious problem for simulations of how galaxies form, how our universe is structured, and what happens as it continues to expand. 

Astronomers began to speculate that the missing matter exists in extremely diffuse clouds of what’s known as the warm–hot intergalactic medium, or WHIM. Theoretically, the WHIM would contain all that unobserved material. After the 1998 paper was published, Prochaska committed himself to finding it. 

But nearly 10 years of his life and about $50 million in taxpayer money later, the hunt was going very poorly.

That search had focused largely on picking apart the light from distant galactic nuclei and studying x-ray emissions from tendrils of gas connecting galaxies. The breakthrough came in 2007, when Prochaska was sitting on a couch in a meeting room at the University of California, Santa Cruz, reviewing new research papers with his colleagues. There, amid the stacks of research, sat the paper reporting the discovery of the first FRB.

Duncan Lorimer and David Narkevic, astronomers at West Virginia University, had discovered a recording of an energetic radio wave unlike anything previously observed. The wave lasted for less than five milliseconds, and its spectral lines were very smeared and distorted, unusual characteristics for a radio pulse that was also brighter and more energetic than other known transient phenomena. The researchers concluded that the wave could not have come from within our galaxy, meaning that it had traveled some unknown distance through the universe. 

Here was a signal that had traversed long distances of space, been shaped and affected by electrons along the way, and had enough energy to be clearly detectable despite all the stuff it had passed through. There are no other signals we can currently detect that commonly occur throughout the universe and have this exact set of traits.

“I saw that and I said, ‘Holy cow—that’s how we can solve the missing-baryons problem,’” Prochaska says. Astronomers had used a similar technique with the light from pulsars— spinning neutron stars that beam radiation from their poles—to count electrons in the Milky Way. But pulsars are too dim to illuminate more of the universe. FRBs were thousands of times brighter, offering a way to use that technique to study space well beyond our galaxy.

A visualization of the cosmic web, the large-scale structure of the universe. Each bright knot is an entire galaxy, while the purple filaments show material between them.
This visualization of large-scale structure in the universe shows galaxies (bright knots) and the filaments of material between them.
NASA/NCSA UNIVERSITY OF ILLINOIS VISUALIZATION BY FRANK SUMMERS, SPACE TELESCOPE SCIENCE INSTITUTE, SIMULATION BY MARTIN WHITE AND LARS HERNQUIST, HARVARD UNIVERSITY

There’s a catch, though: in order for an FRB to be an indicator of what lies in the seemingly empty space between galaxies, researchers have to know where it comes from. If you don’t know how far the FRB has traveled, you can’t make any definitive estimate of what space looks like between its origin point and Earth. 

Astronomers couldn’t even point to the direction that the first 2007 FRB came from, let alone calculate the distance it had traveled. It was detected by an enormous single-dish radio telescope at the Parkes Observatory (now called the Murriyang) in New South Wales, which is great at picking up incoming radio waves but can pinpoint FRBs only to an area of the sky as large as Earth’s full moon. For the next decade, telescopes continued to identify FRBs without providing a precise origin, making them a fascinating mystery but not practically useful.

Then, in 2015, one particular radio wave flashed—and then flashed again. Over the course of two months of observation from the Arecibo telescope in Puerto Rico, the radio waves came again and again, flashing 10 times. This was the first repeating burst of FRBs ever observed (a mystery in its own right), and now researchers had a chance to determine where the radio waves had begun, using the opportunity to home in on its location.

In 2017, that’s what happened. The researchers obtained an accurate position for the fast radio burst using the NRAO Very Large Array telescope in central New Mexico. Armed with that position, the researchers then used the Gemini optical telescope in Hawaii to take a picture of the location, revealing the galaxy where the FRB had begun and how far it had traveled. “That’s when it became clear that at least some of these we’d get the distance for. That’s when I got really involved and started writing telescope proposals,” Prochaska says. 

That same year, astronomers from across the globe gathered in Aspen, Colorado, to discuss the potential for studying FRBs. Researchers debated what caused them. Neutron stars? Magnetars, neutron stars with such powerful magnetic fields that they emit x-rays and gamma rays? Merging galaxies? Aliens? Did repeating FRBs and one-offs have different origins, or could there be some other explanation for why some bursts repeat and most do not? Did it even matter, since all the bursts could be used as probes regardless of what caused them? At that Aspen meeting, Prochaska met with a team of radio astronomers based in Australia, including Keith Bannister, a telescope expert involved in the early work to build a precursor facility for the Square Kilometer Array, an international collaboration to build the largest radio telescope arrays in the world. 

The construction of that precursor telescope, called ASKAP, was still underway during that meeting. But Bannister, a telescope expert at the Australian government’s scientific research agency, CSIRO, believed that it could be requisitioned and adapted to simultaneously locate and observe FRBs. 

Bannister and the other radio experts affiliated with ASKAP understood how to manipulate radio telescopes for the unique demands of FRB hunting; Prochaska was an expert in everything “not radio.” They agreed to work together to identify and locate one-off FRBs (because there are many more of these than there are repeating ones) and then use the data to address the problem of the missing baryons. 

And over the course of the next five years, that’s exactly what they did—with astonishing success.

Building a pipeline

To pinpoint a burst in the sky, you need a telescope with two things that have traditionally been at odds in radio astronomy: a very large field of view and high resolution. The large field of view gives you the greatest possible chance to detect a fleeting, unpredictable burst. High resolution  lets you determine where that burst actually sits in your field of view. 

ASKAP was the perfect candidate for the job. Located in the westernmost part of the Australian outback, where cattle and sheep graze on public land and people are few and far between, the telescope consists of 36 dishes, each with a large field of view. These dishes are separated by large distances, allowing observations to be combined through a technique called interferometry so that a small patch of the sky can be viewed with high precision.  

The dishes weren’t formally in use yet, but Bannister had an idea. He took them and jerry-rigged a “fly’s eye” telescope, pointing the dishes at different parts of the sky to maximize its ability to spot something that might flash anywhere. 

“Suddenly, it felt like we were living in paradise,” Bannister says. “There had only ever been three or four FRB detections at this point, and people weren’t entirely sure if [FRBs] were real or not, and we were finding them every two weeks.” 

When ASKAP’s interferometer went online in September 2018, the real work began. Bannister designed a piece of software that he likens to live-action replay of the FRB event. “This thing comes by and smacks into your telescope and disappears, and you’ve got a millisecond to get its phone number,” he says. To do so, the software detects the presence of an FRB within a hundredth of a second and then reaches upstream to create a recording of the telescope’s data before the system overwrites it. Data from all the dishes can be processed and combined to reconstruct a view of the sky and find a precise point of origin. 

The team can then send the coordinates on to optical telescopes, which can take detailed pictures of the spot to confirm the presence of a galaxy—the likely origin point of the FRB. 

CSIRO's Australian Square Kilometre Array Pathfinder (ASKAP) telescope
These two dishes are part of CSIRO’s Australian Square Kilometre Array Pathfinder (ASKAP) telescope.
CSIRO

Ryder’s team used data on the galaxy’s spectrum, gathered from the European Southern Observatory, to measure how much its light stretched as it traversed space to reach our telescopes. This “redshift” becomes a proxy for distance, allowing astronomers to estimate just how much space the FRB’s light has passed through. 

In 2018, the live-action replay worked for the first time, making Bannister, Ryder, Prochaska, and the rest of their research team the first to localize an FRB that was not repeating. By the following year, the team had localized about five of them. By 2020, they had published a paper in Nature declaring that the FRBs had let them count up the universe’s missing baryons. 

The centerpiece of the paper’s argument was something called the dispersion measure—a number that reflects how much an FRB’s light has been smeared by all the free electrons along our line of sight. In general, the farther an FRB travels, the higher the dispersion measure should be. Armed with both the travel distance (the redshift) and the dispersion measure for a number of FRBs, the researchers found they could extrapolate the total density of particles in the universe. J-P Macquart, the paper’s lead author, believed that the relationship between dispersion measure and FRB distance was predictable and could be applied to map the universe.

As a leader in the field and a key player in the advancement of FRB research, Macquart would have been interviewed for this piece. But he died of a heart attack one week after the paper was published, at the age of 45. FRB researchers began to call the relationship between dispersion and distance the “Macquart relation,” in honor of his memory and his push for the groundbreaking idea that FRBs could be used for cosmology. 

Proving that the Macquart relation would hold at greater distances became not just a scientific quest but also an emotional one. 

“I remember thinking that I know something about the universe that no one else knows.”

The researchers knew that the ASKAP telescope was capable of detecting bursts from very far away—they just needed to find one. Whenever the telescope detected an FRB, Ryder was tasked with helping to determine where it had originated. It took much longer than he would have liked. But one morning in July 2022, after many months of frustration, Ryder downloaded the newest data email from the European Southern Observatory and began to scroll through the spectrum data. Scrolling, scrolling, scrolling—and then there it was: light from 8 billion years ago, or a redshift of one, symbolized by two very close, bright lines on the computer screen, showing the optical emissions from oxygen. “I remember thinking that I know something about the universe that no one else knows,” he says. “I wanted to jump onto a Slack and tell everyone, but then I thought: No, just sit here and revel in this. It has taken a lot to get to this point.” 

With the October 2023 Science paper, the team had basically doubled the distance baseline for the Macquart relation, honoring Macquart’s memory in the best way they knew how. The distance jump was significant because Ryder and the others on his team wanted to confirm that their work would hold true even for FRBs whose light comes from so far away that it reflects a much younger universe. They also wanted to establish that it was possible to find FRBs at this redshift, because astronomers need to collect evidence about many more like this one in order to create the cosmological map that motivates so much FRB research.

“It’s encouraging that the Macquart relation does still seem to hold, and that we can still see fast radio bursts coming from those distances,” Ryder said. “We assume that there are many more out there.” 

Mapping the cosmic web

The missing stuff that lies between galaxies, which should contain the majority of the matter in the universe, is often called the cosmic web. The diffuse gases aren’t floating like random clouds; they’re strung together more like a spiderweb, a complex weaving of delicate filaments that stretches as the galaxies at their nodes grow and shift. This gas probably escaped from galaxies into the space beyond when the galaxies first formed, shoved outward by massive explosions.

“We don’t understand how gas is pushed in and out of galaxies. It’s fundamental for understanding how galaxies form and evolve,” says Kiyoshi Masui, the director of MIT’s Synoptic Radio Lab. “We only exist because stars exist, and yet this process of building up the building blocks of the universe is poorly understood … Our ability to model that is the gaping hole in our understanding of how the universe works.” 

Astronomers are also working to build large-scale maps of galaxies in order to precisely measure the expansion of the universe. But the cosmological modeling underway with FRBs should create a picture of invisible gasses between galaxies, one that currently does not exist. To build a three-dimensional map of this cosmic web, astronomers will need precise data on thousands of FRBs from regions near Earth and from very far away, like the FRB at redshift one. “Ultimately, fast radio bursts will give you a very detailed picture of how gas gets pushed around,” Masui says. “To get to the cosmological data, samples have to get bigger, but not a lot bigger.” 

That’s the task at hand for Masui, who leads a team searching for FRBs much closer to our galaxy than the ones found by the Australian-led collaboration. Masui’s team conducts FRB research with the CHIME telescope in British Columbia, a nontraditional radio telescope with a very wide field of view and focusing reflectors that look like half-pipes instead of dishes. CHIME (short for “Canadian Hydrogen Intensity Mapping Experiment)” has no moving parts and is less reliant on mirrors than a traditional telescope (focusing light in only one direction rather than two), instead using digital techniques to process its data. CHIME can use its digital technology to focus on many places at once, creating a 200-square-degree field of view compared with ASKAP’s 30-degree one. Masui likened it to a mirror that can be focused on thousands of different places simultaneously. 

Because of this enormous field of view, CHIME has been able to gather data on thousands of bursts that are closer to the Milky Way. While CHIME cannot yet precisely locate where they are coming from the way that ASKAP can (the telescope is much more compact, providing lower resolution), Masui is leading the effort to change that by building three smaller versions of the same telescope in British Columbia; Green Bank, West Virginia; and Northern California. The additional data provided by these telescopes, the first of which will probably be collected sometime this year, can be combined with data from the original CHIME telescope to produce location information that is about 1,000 times more precise. That should be detailed enough for cosmological mapping.

The Canadian Hydrogen Intensity Mapping Experiment, or CHIME, a Canadian radio telescope, is shown at night.
The reflectors of the Canadian Hydrogen Intensity Mapping Experiment, or CHIME, have been used to spot thousands of FRBs.
ANDRE RECNIK/CHIME

Telescope technology is improving so fast that the quest to gather enough FRB samples from different parts of the universe for a cosmological map could be finished within the next 10 years. In addition to CHIME, the BURSTT radio telescope in Taiwan should go online this year; the CHORD telescope in Canada, designed to surpass CHIME, should begin operations in 2025; and the Deep Synoptic Array in California could transform the field of radio astronomy when it’s finished, which is expected to happen sometime around the end of the decade. 

And at ASKAP, Bannister is building a new tool that will quintuple the sensitivity of the telescope, beginning this year. If you can imagine stuffing a million people simultaneously watching uncompressed YouTube videos into a box the size of a fridge, that’s probably the easiest way to visualize the data handling capabilities of this new processor, called a field-programmable gate array, which Bannister is almost finished programming. He expects the new device to allow the team to detect one new FRB each day.

With all the telescopes in competition, Bannister says, “in five or 10 years’ time, there will be 1,000 new FRBs detected before you can write a paper about the one you just found … We’re in a race to make them boring.” 

Prochaska is so confident FRBs will finally give us the cosmological map he’s been working toward his entire life that he’s started studying for a degree in oceanography. Once astronomers have measured distances for 1,000 of the bursts, he plans to give up the work entirely. 

“In a decade, we could have a pretty decent cosmological map that’s very precise,” he says. “That’s what the 1,000 FRBs are for—and I should be fired if we don’t.”

Unlike most scientists, Prochaska can define the end goal. He knows that all those FRBs should allow astronomers to paint a map of the invisible gases in the universe, creating a picture of how galaxies evolve as gases move outward and then fall back in. FRBs will grant us an understanding of the shape of the universe that we don’t have today—even if the mystery of what makes them endures. 

Anna Kramer is a science and climate journalist based in Washington, D.C.

Roundtables: Inside the Next Era of AI and Hardware

Recorded on April 30, 2024

Inside the Next Era of AI and Hardware

Speakers: James O’Donnell, AI reporter, and Charlotte Jee, News editor

Hear first-hand from our AI reporter, James O’Donnell, as he walks our news editor Charlotte Jee through the latest goings-on in his beat, from rapid advances in robotics to autonomous military drones, wearable devices, and tools for AI-powered surgeries.

Related Coverage

Posted in Uncategorized

The Download: robotics’ data bottleneck, and our AI afterlives

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

The robot race is fueling a fight for training data

We’re interacting with AI tools more directly—and regularly—than ever before. Interacting with robots, by way of contrast, is still a rarity for most. But experts say that’s on the cusp of changing. 

Roboticists believe that, using new AI techniques, they can unlock more capable robots that can move freely through unfamiliar environments and tackle challenges they’ve never seen before.

But something is standing in the way: lack of access to the types of data used to train robots so they can interact with the physical world. It’s far harder to come by than the data used to train the most advanced AI models, and that scarcity is one of the main things currently holding progress in robotics back.

As a result, leading companies and labs are in fierce competition to find new and better ways to gather the data they need. It’s led them down strange paths, like using robotic arms to flip pancakes for hours on end. And they’re running into the same sorts of privacy, ethics, and copyright issues as their counterparts in the world of AI. Read the full story.

—James O’Donnell

My deepfake shows how valuable our data is in the age of AI

—Melissa Heikkilä

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. But they raise a big question: What happens to our data once we submit it to AI companies? Read the full story.

This story is from The Algorithm, our weekly AI newsletter. Sign up to receive it in your inbox every Monday.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 AI startups without products can still raise millions
How some of them plan to make money is unclear, but that doesn’t deter investors. (WSJ $)+ Those large AI models are wildly expensive to run. (Bloomberg $)
+ AI hype is built on high test scores. Those tests are flawed. (MIT Technology Review)

2 The EU says Meta isn’t doing enough to counter Russian disinformation
So it’s launching formal proceedings against the company ahead of EU elections. (The Guardian)
+ Three technology trends shaping 2024’s elections. (MIT Technology Review)

3 Meet the humans fighting back against algorithmic curation
The solution could, ironically, lie with different kinds of algorithms. (Wired $)

4 An AI blood test claims to diagnose postpartum depression
It says the presence of a gene that links moods more closely to hormonal changes is an indicator. (WP $)
+ An AI system helped to save lives in a hospital trial. (New Scientist $)

5 Tesla secretly tested its autonomous driving tech in San Francisco
Which hints that its previous ‘general solutions’ approach fell short. (The Information $)
+ Robotaxis are here. It’s time to decide what to do about them. (MIT Technology Review)

6 Why egg freezing has failed to live up to its hype
We’re finally getting a clearer picture of how effective the procedure is. (Vox)
+ I took an international trip with my frozen eggs to learn about the fertility industry. (MIT Technology Review)

7 NASA has finally solved a long-standing solar mystery 
The sun’s corona is far hotter than its surface. But why? (Quanta Magazine)

8 Do dating apps actually help you find your soulmate?
Chemistry and a great relationship are difficult to quantify. (The Guardian)
+ Here’s how the net’s newest matchmakers help you find love. (MIT Technology Review)

9 Online messaging has come a long way
BBS, anyone? (Ars Technica)

10 The three-year search for a synth-heavy pop song is over 
…But its origins are seedier than you’d expect. (404 Media)

Quote of the day

“This is the Oppenheimer Moment of our generation.”

—Alexander Schallenberg, Austria’s foreign minister, warns against granting AI too much autonomy on the battlefield during a summit in Vienna, Bloomberg reports.

The big story

Next slide, please: A brief history of the corporate presentation

August 2023

PowerPoint is everywhere. It’s used in religious sermons; by schoolchildren preparing book reports; at funerals and weddings. In 2010, Microsoft announced that PowerPoint was installed on more than a billion computers worldwide. 

But before PowerPoint, 35-millimeter film slides were king. They were the only medium for the kinds of high-impact presentations given by CEOs and top brass at annual meetings for stockholders, employees, and salespeople. 

Known in the business as “multi-image” shows, these presentations required a small army of producers, photographers, and live production staff to pull off. Read this story to delve into the fascinating, flashy history of corporate presentations

—Claire L. Evans

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)

+ This is some seriously committed egg flipping. 🍳
+ How to spend time and make precious memories with the people you love.
+ Gen Z is on the move: to the US Midwest apparently.
+ Cool: these novels were all inspired by the authors’ day jobs.

My deepfake shows how valuable our data is in the age of AI

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

Deepfakes are getting good. Like, really good. Earlier this month I went to a studio in East London to get myself digitally cloned by the AI video startup Synthesia. They made a hyperrealistic deepfake that looked and sounded just like me, with realistic intonation. It is a long way away from the glitchiness of earlier generations of AI avatars. The end result was mind-blowing. It could easily fool someone who doesn’t know me well.

Synthesia has managed to create AI avatars that are remarkably humanlike after only one year of tinkering with the latest generation of generative AI. It’s equally exciting and daunting thinking about where this technology is going. It will soon be very difficult to differentiate between what is real and what is not, and this is a particularly acute threat given the record number of elections happening around the world this year. 

We are not ready for what is coming. If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the “liar’s dividend.” They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI. 

I just published a story on my deepfake creation experience, and on the big questions about a world where we increasingly can’t tell what’s real. Read it here

But there is another big question: What happens to our data once we submit it to AI companies? Synthesia says it does not sell the data it collects from actors and customers, although it does release some of it for academic research purposes. The company uses avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company deletes their data.

But other companies are not that transparent about their intentions. As my colleague Eileen Guo reported last year, companies such as Meta license actors’ data—including their faces and  expressions—in a way that allows the companies to do whatever they want with it. Actors are paid a small up-front fee, but their likeness can then be used to train AI models in perpetuity without their knowledge. 

Even if contracts for data are transparent, they don’t apply if you die, says Carl Öhman, an assistant professor at Uppsala University who has studied the online data left by deceased people and is the author of a new book, The Afterlife of Data. The data we input into social media platforms or AI models might end up benefiting companies and living on long after we’re gone. 

“Facebook is projected to host, within the next couple of decades, a couple of billion dead profiles,” Öhman says. “They’re not really commercially viable. Dead people don’t click on any ads, but they take up server space nevertheless,” he adds. This data could be used to train new AI models, or to make inferences about the descendants of those deceased users. The whole model of data and consent with AI presumes that both the data subject and the company will live on forever, Öhman says.

Our data is a hot commodity. AI language models are trained by indiscriminately scraping the web, and that also includes our personal data. A couple of years ago I tested to see if GPT-3, the predecessor of the language model powering ChatGPT, has anything on me. It struggled, but I found that I was able to retrieve personal information about MIT Technology Review’s editor in chief, Mat Honan. 

High-quality, human-written data is crucial to training the next generation of powerful AI models, and we are on the verge of running out of free online training data. That’s why AI companies are racing to strike deals with news organizations and publishers to access their data treasure chests. 

Old social media sites are also a potential gold mine: when companies go out of business or platforms stop being popular, their assets, including users’ data, get sold to the highest bidder, says Öhman. 

“MySpace data has been bought and sold multiple times since MySpace crashed. And something similar may well happen to Synthesia, or X, or TikTok,” he says. 

Some people may not care much about what happens to their data, says Öhman. But securing exclusive access to high-quality data helps cement the monopoly position of large corporations, and that harms us all. This is something we need to grapple with as a society, he adds. 

Synthesia said it will delete my avatar after my experiment, but the whole experience did make me think of all the cringeworthy photos and posts that haunt me on Facebook and other social media platforms. I think it’s time for a purge.


Now read the rest of The Algorithm

Deeper Learning

Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk. A new tool created by Cleanlab, an AI startup spun out of MIT, is designed to provide a clearer sense of how trustworthy these models really are. 

A BS-o-meter for chatbots: Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. Read more from Will Douglas Heaven.

Bits and Bytes

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan
President Joe Biden signed a $95 billion aid package into law last week. The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. (MIT Technology Review

Rishi Sunak promised to make AI safe. Big Tech’s not playing ball.
The UK’s prime minister thought he secured a political win when he got AI power players to agree to voluntary safety testing with the UK’s new AI Safety Institute. Six months on, it turns out pinkie promises don’t go very far. OpenAI and Meta have not granted access to the AI Safety Institute to do prerelease safety testing on their models. (Politico

Inside the race to find AI’s killer app
The AI hype bubble is starting to deflate as companies try to find a way to make profits out of the eye-wateringly expensive process of developing and running this technology. Tech companies haven’t solved some of the fundamental problems slowing its wider adoption, such as the fact that generative models constantly make things up. (The Washington Post)  

Why the AI industry’s thirst for new data centers can’t be satisfied
The current boom in data-hungry AI means there is now a shortage of parts, property, and power to build data centers. (The Wall Street Journal

The friends who became rivals in Big Tech’s AI race
This story is a fascinating look into one of the most famous and fractious relationships in AI. Demis Hassabis and Mustafa Suleyman are old friends who grew up in London and went on to cofound AI lab DeepMind. Suleyman was ousted following a bullying scandal, went on to start his own short-lived startup, and now heads rival Microsoft’s AI efforts, while Hassabis still runs DeepMind, which is now Google’s central AI research lab. (The New York Times

This creamy vegan cheese was made with AI
Startups are using artificial intelligence to design plant-based foods. The companies train algorithms on data sets of ingredients with desirable traits like flavor, scent, or stretchability. Then they use AI to comb troves of data to develop new combinations of those ingredients that perform similarly. (MIT Technology Review

The robot race is fueling a fight for training data

Since ChatGPT was released, we now interact with AI tools more directly—and regularly—than ever before. 

But interacting with robots, by way of contrast, is still a rarity for most. If you don’t undergo complex surgery or work in logistics, the most advanced robot you encounter in your daily life might still be a vacuum cleaner (if you’re feeling young, the first Roomba was released 22 years ago). 

But that’s on the cusp of changing. Roboticists believe that by using new AI techniques, they will achieve something the field has pined after for decades: more capable robots that can move freely through unfamiliar environments and tackle challenges they’ve never seen before. 

“It’s like being strapped to the front of a rocket,” says Russ Tedrake, vice president of robotics research at the Toyota Research Institute, says of the field’s pace right now. Tedrake says he has seen plenty of hype cycles rise and fall, but none like this one. “I’ve been in the field for 20-some years. This is different,” he says. 

But something is slowing that rocket down: lack of access to the types of data used to train robots so they can interact more smoothly with the physical world. It’s far harder to come by than the data used to train the most advanced AI models like GPT—mostly text, images, and videos scraped off the internet. Simulation programs can help robots learn how to interact with places and objects, but the results still tend to fall prey to what’s known as the “sim-to-real gap,” or failures that arise when robots move from the simulation to the real world. 

For now, we still need access to physical, real-world data to train robots. That data is relatively scarce and tends to require a lot more time, effort, and expensive equipment to collect. That scarcity is one of the main things currently holding progress in robotics back. 

As a result, leading companies and labs are in fierce competition to find new and better ways to gather the data they need. It’s led them down strange paths, like using robotic arms to flip pancakes for hours on end, watching thousands of hours of graphic surgery videos pulled from YouTube, or deploying researchers to numerous Airbnbs in order to film every nook and cranny. Along the way, they’re running into the same sorts of privacy, ethics, and copyright issues as their counterparts in the world of chatbots. 

The new need for data

For decades, robots were trained on specific tasks, like picking up a tennis ball or doing a somersault. While humans learn about the physical world through observation and trial and error, many robots were learning through equations and code. This method was slow, but even worse, it meant that robots couldn’t transfer skills from one task to a new one. 

But now, AI advances are fast-tracking a shift that had already begun: letting robots teach themselves through data. Just as a language model can learn from a library’s worth of novels, robot models can be shown a few hundred demonstrations of a person washing ketchup off a plate using robotic grippers, for example, and then imitate the task without being taught explicitly what ketchup looks like or how to turn on the faucet. This approach is bringing faster progress and machines with much more general capabilities. 

Now every leading company and lab is trying to enable robots to reason their way through new tasks using AI. Whether they succeed will hinge on whether researchers can find enough diverse types of data to fine-tune models for robots, as well as novel ways to use reinforcement learning to let them know when they’re right and when they’re wrong. 

“A lot of people are scrambling to figure out what’s the next big data source,” says Pras Velagapudi, chief technology officer of Agility Robotics, which makes a humanoid robot that operates in warehouses for customers including Amazon. The answers to Velagapudi’s question will help define what tomorrow’s machines will excel at, and what roles they may fill in our homes and workplaces. 

Prime training data

To understand how roboticists are shopping for data, picture a butcher shop. There are prime, expensive cuts ready to be cooked. There are the humble, everyday staples. And then there’s the case of trimmings and off-cuts lurking in the back, requiring a creative chef to make them into something delicious. They’re all usable, but they’re not all equal.

For a taste of what prime data looks like for robots, consider the methods adopted by the Toyota Research Institute (TRI). Amid a sprawling laboratory in Cambridge, Massachusetts, equipped with robotic arms, computers, and a random assortment of everyday objects like dustpans and egg whisks, researchers teach robots new tasks through teleoperation, creating what’s called demonstration data. A human might use a robotic arm to flip a pancake 300 times in an afternoon, for example.

The model processes that data overnight, and then often the robot can perform the task autonomously the next morning, TRI says. Since the demonstrations show many iterations of the same task, teleoperation creates rich, precisely labeled data that helps robots perform well in new tasks.

The trouble is, creating such data takes ages, and it’s also limited by the number of expensive robots you can afford. To create quality training data more cheaply and efficiently, Shuran Song, head of the Robotics and Embodied AI Lab at Stanford University, designed a device that can more nimbly be used with your hands, and built at a fraction of the cost. Essentially a lightweight plastic gripper, it can collect data while you use it for everyday activities like cracking an egg or setting the table. The data can then be used to train robots to mimic those tasks. Using simpler devices like this could fast-track the data collection process.

Open-source efforts

Roboticists have recently alighted upon another method for getting more teleoperation data: sharing what they’ve collected with each other, thus saving them the laborious process of creating data sets alone. 

The Distributed Robot Interaction Dataset (DROID), published last month, was created by researchers at 13 institutions, including companies like Google DeepMind and top universities like Stanford and Carnegie Mellon. It contains 350 hours of data generated by humans doing tasks ranging from closing a waffle maker to cleaning up a desk. Since the data was collected using hardware that’s common in the robotics world, researchers can use it to create AI models and then test those models on equipment they already have. 

The effort builds on the success of the Open X-Embodiment Collaboration, a similar project from Google DeepMind that aggregated data on 527 skills, collected from a variety of different types of hardware. The data set helped build Google DeepMind’s RT-X model, which can turn text instructions (for example, “Move the apple to the left of the soda can”) into physical movements. 

Robotics models built on open-source data like this can be impressive, says Lerrel Pinto, a researcher who runs the General-purpose Robotics and AI Lab at New York University. But they can’t perform across a wide enough range of use cases to compete with proprietary models built by leading private companies. What is available via open source is simply not enough for labs to successfully build models at a scale that would produce the gold standard: robots that have general capabilities and can receive instructions through text, image, and video.

“The biggest limitation is the data,” he says. Only wealthy companies have enough. 

These companies’ data advantage is only getting more thoroughly cemented over time. In their pursuit of more training data, private robotics companies with large customer bases have a not-so-secret weapon: their robots themselves are perpetual data-collecting machines.

Covariant, a robotics company founded in 2017 by OpenAI researchers, deploys robots trained to identify and pick items in warehouses for companies like Crate & Barrel and Bonprix. These machines constantly collect footage, which is then sent back to Covariant. Every time the robot fails to pick up a bottle of shampoo, for example, it becomes a data point to learn from, and the model improves its shampoo-picking abilities for next time. The result is a massive, proprietary data set collected by the company’s own machines. 

This data set is part of why earlier this year Covariant was able to release a powerful foundation model, as AI models capable of a variety of uses are known. Customers can now communicate with its commercial robots much as you’d converse with a chatbot: you can ask questions, show photos, and instruct it to take a video of itself moving an item from one crate to another. These customer interactions with the model, which is called RFM-1, then produce even more data to help it improve.

Peter Chen, cofounder and CEO of Covariant, says exposing the robots to a number of different objects and environments is crucial to the model’s success. “We have robots handling apparel, pharmaceuticals, cosmetics, and fresh groceries,” he says. “It’s one of the unique strengths behind our data set.” Up next will be bringing its fleet into more sectors and even having the AI model power different types of robots, like humanoids, Chen says.

Learning from video

The scarcity of high-quality teleoperation and real-world data has led some roboticists to propose bypassing that collection method altogether. What if robots could just learn from videos of people?

Such video data is easier to produce, but unlike teleoperation data, it lacks “kinematic” data points, which plot the exact movements of a robotic arm as it moves through space. 

Researchers from the University of Washington and Nvidia have created a workaround, building a mobile app that lets people train robots using augmented reality. Users take videos of themselves completing simple tasks with their hands, like picking up a mug, and the AR program can translate the results into waypoints for the robotics software to learn from. 

Meta AI is pursuing a similar collection method on a larger scale through its Ego4D project, a data set of more than 3,700 hours of video taken by people around the world doing everything from laying bricks to playing basketball to kneading bread dough. The data set is broken down by task and contains thousands of annotations, which detail what’s happening in each scene, like when a weed has been removed from a garden or a piece of wood is fully sanded.

Learning from video data means that robots can encounter a much wider variety of tasks than they could if they relied solely on human teleoperation (imagine folding croissant dough with robot arms). That’s important, because just as powerful language models need complex and diverse data to learn, roboticists can create their own powerful models only if they expose robots to thousands of tasks.

To that end, some researchers are trying to wring useful insights from a vast source of abundant but low-quality data: YouTube. With thousands of hours of video uploaded every minute, there is no shortage of available content. The trouble is that most of it is pretty useless for a robot. That’s because it’s not labeled with the types of information robots need, like annotations or kinematic data. 

Photo Illustration showing a robotic hand using laptop, watching YouTube
SARAH ROGERS/MITTR | GETTY

“You can say [to a robot], Oh, this is a person playing Frisbee with their dog,” says Chen, of Covariant, imagining a typical video that might be found on YouTube. “But it’s very difficult for you to say, Well, when this person throws a Frisbee, this is the acceleration and the rotation and that’s why it flies this way.”

Nonetheless, a few attempts have proved promising. When he was a postdoc at Stanford, AI researcher Emmett Goodman looked into how AI could be brought into the operating room to make surgeries safer and more predictable. Lack of data quickly became a roadblock. In laparoscopic surgeries, surgeons often use robotic arms to manipulate surgical tools inserted through very small incisions in the body. Those robotic arms have cameras capturing footage that can help train models, once personally identifying information has been removed from the data. In more traditional open surgeries, on the other hand, surgeons use their hands instead of robotic arms. That produces much less data to build AI models with. 

“That is the main barrier to why open-surgery AI is the slowest to develop,” he says. “How do you actually collect that data?”

To tackle that problem, Goodman trained an AI model on thousands of hours of open-surgery videos, taken by doctors with handheld or overhead cameras, that his team gathered from YouTube (with identifiable information removed). His model, as described in a paper in the medical journal JAMA in December 2023, could then identify segments of the operations from the videos. This laid the groundwork for creating useful training data, though Goodman admits that the barriers to doing so at scale, like patient privacy and informed consent, have not been overcome. 

Uncharted legal waters

Chances are that wherever roboticists turn for their new troves of training data, they’ll at some point have to wrestle with some major legal battles. 

The makers of large language models are already having to navigate questions of credit and copyright. A lawsuit filed by the New York Times alleges that ChatGPT copies the expressive style of its stories when generating text. The chief technical officer of OpenAI recently made headlines when she said the company’s video generation tool Sora was trained on publicly available data, sparking a critique from YouTube’s CEO, who said that if Sora learned from YouTube videos, it would be a violation of the platform’s terms of service.

“It is an area where there’s a substantial amount of legal uncertainty,” says Frank Pasquale, a professor at Cornell Law School. If robotics companies want to join other AI companies in using copyrighted works in their training sets, it’s unclear whether that’s allowed under the fair-use doctrine, which permits copyrighted material to be used without permission in a narrow set of circumstances. An example often cited by tech companies and those sympathetic to their view is the 2015 case of Google Books, in which courts found that Google did not violate copyright laws in making a searchable database of millions of books. That legal precedent may tilt the scales slightly in tech companies’ favor, Pasquale says.

It’s far too soon to tell whether legal challenges will slow down the robotics rocket ship, since AI-related cases are sprawling and still undecided. But it’s safe to say that roboticists scouring YouTube or other internet video sources for training data will be wading in fairly uncharted waters.

The next era

Not every roboticist feels that data is the missing link for the next breakthrough. Some argue that if we build a good enough virtual world for robots to learn in, maybe we don’t need training data from the real world at all. Why go through the effort of training a pancake-flipping robot in a real kitchen, for example, if it could learn through a digital simulation of a Waffle House instead?

Roboticists have long used simulator programs, which digitally replicate the environments that robots navigate through, often down to details like the texture of the floorboards or the shadows cast by overhead lights. But as powerful as they are, roboticists using these programs to train machines have always had to work around that sim-to-real gap. 

Now the gap might be shrinking. Advanced image generation techniques and faster processing are allowing simulations to look more like the real world. Nvidia, which leveraged its experience in video game graphics to build the leading robotics simulator, called Isaac Sim, announced last month that leading humanoid robotics companies like Figure and Agility are using its program to build foundation models. These companies build virtual replicas of their robots in the simulator and then unleash them to explore a range of new environments and tasks.

Deepu Talla, vice president of robotics and edge computing at Nvidia, doesn’t hold back in predicting that this way of training will nearly replace the act of training robots in the real world. It’s simply far cheaper, he says.

“It’s going to be a million to one, if not more, in terms of how much stuff is going to be done in simulation,” he says. “Because we can afford to do it.”

But if models can solve some of the “cognitive” problems, like learning new tasks, there are a host of challenges to realizing that success in an effective and safe physical form, says Aaron Saunders, chief technology officer of Boston Dynamics. We’re a long way from building hardware that can sense different types of materials, scrub and clean, or apply a gentle amount of force.

“There’s still a massive piece of the equation around how we’re going to program robots to actually act on all that information to interact with that world,” he says.

If we solved that problem, what would the robotic future look like? We could see nimble robots that help people with physical disabilities move through their homes, autonomous drones that clean up pollution or hazardous waste, or surgical robots that make microscopic incisions, leading to operations with a reduced risk of complications. For all these optimistic visions, though, more controversial ones are already brewing. The use of AI by militaries worldwide is on the rise, and the emergence of autonomous weapons raises troubling questions.

The labs and companies poised to lead in the race for data include, at the moment, the humanoid-robot startups beloved by investors (Figure AI was recently boosted by a $675 million funding round), commercial companies with sizable fleets of robots collecting data, and drone companies buoyed by significant military investment. Meanwhile, smaller academic labs are doing more with less to create data sets that rival those available to Big Tech. 

But what’s clear to everyone I speak with is that we’re at the very beginning of the robot data race. Since the correct way forward is far from obvious, all roboticists worth their salt are pursuing any and all methods to see what sticks.

There “isn’t really a consensus” in the field, says Benjamin Burchfiel, a senior research scientist in robotics at TRI. “And that’s a healthy place to be.”

The Download: inside the US defense tech aid package, and how AI is improving vegan cheese

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Here’s the defense tech at the center of US aid to Israel, Ukraine, and Taiwan

After weeks of drawn-out congressional debate over how much the United States should spend on conflicts abroad, President Joe Biden signed a $95 billion aid package into law last week.

The bill will send a significant quantity of supplies to Ukraine and Israel, while also supporting Taiwan with submarine technology to aid its defenses against China. It’s also sparked renewed calls for stronger crackdowns on Iranian-produced drones. 

James O’Donnell, our AI reporter, spoke to Andrew Metrick, a fellow with the defense program at the Center for a New American Security, a think tank, to discuss how the spending bill provides a window into US strategies around four key defense technologies with the power to reshape how today’s major conflicts are being fought. Read the full story.

This piece is part of MIT Technology Review Explains: a series delving into the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

Hear more about how AI intersects with hardware

Hear first-hand from James in our latest subscribers-only Rountables session, as he walks news editor Charlotte Jee through the latest goings-on in his beat, from rapid advances in robotics to autonomous military drones, wearable devices, and tools for AI-powered surgeries Register now to join the discussion tomorrow at 11:30am ET.

Check out some more of James’ reporting:

+ Inside a Californian startup’s herculean efforts to bring a small slice of the chipmaking supply chain back to the US.

+ An OpenAI spinoff has built an AI model that helps robots learn tasks like humans.
But can it graduate from the lab to the warehouse floor? Read the full story.

+ Watch this robot as it learns to stitch up wounds all on its own.

+ A new satellite will use Google’s AI to map methane leaks from space. It could help to form the most detailed portrait yet of methane emissions—but companies and countries will actually have to act on the data.

This creamy vegan cheese was made with AI

Most vegan cheese falls into an edible uncanny valley full of discomforting not-quite-right versions of the real thing. But machine learning is ushering in a new age of completely vegan cheese that’s much closer in taste and texture to traditional fromage.

Several startups are using AI to design plant-based foods including cheese, training algorithms on datasets of ingredients with desirable traits like flavor, scent, or stretchability. Then they use AI to comb troves of data to develop new combinations of those ingredients that perform similarly. But not everyone in the industry is bullish about AI-assisted ingredient discovery. Read the full story.

—Andrew Rosenblum

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Tesla has struck a deal to bring its self-driving tech to China 
It’ll use mapping and navigation functions from native self-driving car company Baidu. (WSJ $)
+ Tesla is facing at least eight legal cases over the tech in the next year. (WP $)
+ It’s also struggling with a major union issue in Sweden. (Bloomberg $)
+ Baidu’s self-driving cars have been on Beijing’s streets for years. (MIT Technology Review)

 2 OpenAI will train its models on a paywalled British newspaper’s articles
ChatGPT will include links to Financial Times articles in its future responses. (FT $)
+ We could run out of data to train AI language programs. (MIT Technology Review)

3 This summer could be our hottest yet
Extreme weather events are likely to be on the horizon across the globe. (Vox)
+ One of the biggest untapped resources of renewable energy? Tidal power. (Undark Magazine)
+ Here’s how much heat your body can take. (MIT Technology Review)

4 The UK institute that helped popularize effective altruism has shut down
The controversial philosophies it championed are extremely divisive. (The Guardian)
+ Inside effective altruism, where the far future counts a lot more than the present. (MIT Technology Review)

5 Human soldiers aren’t sure how to feel about their robot counterparts
Some teams get attached to their bots. Others hate them. (IEEE Spectrum)
+ Inside the messy ethics of making war with machines. (MIT Technology Review)

6 The US and China are locked in a race to build ultrafast submarines
But China’s claims that it’s made a laser breakthrough may be overblown. (Insider $)

7 Recruiters are fighting an influx of AI job applications
Tech roles are few and far between, and generative AI is making it easier to mass-apply for what’s available. (Wired $)
+ African universities aren’t preparing graduates for work in the age of AI. (Rest of World)

8 This firm uses a robotic arm to chisel marble sculptures
But it still needs a helping hand from humans. (Bloomberg $)

9 Our email accounts are modern day diaries
It’s an instantly-searchable record of our lives. (NY Mag $)

10 TikTok has fallen in love with Super 8 cameras 🎥
Even though they’re prohibitively expensive. (WSJ $)
+ Gen Z is ditching smartphones in favor of simpler devices. (The Guardian)

Quote of the day

“I have little in common with people who take cold plunges and want to live forever.”

Ethan Mollick, a business school professor at the University of Pennsylvania who advises major companies and policymakers about AI, insists he is far from the Silicon Valley tech bro stereotype to the Wall Street Journal.

The big story

How big science failed to unlock the mysteries of the human brain

August 2021

In September 2011, Columbia University neurobiologist Rafael Yuste and Harvard geneticist George Church made a not-so-modest proposal: to map the activity of the entire human brain.

That knowledge could be harnessed to treat brain disorders like Alzheimer’s, autism, schizophrenia, depression, and traumatic brain injury, and help answer one of the great questions of science: How does the brain bring about consciousness?

A decade on, the US project has wound down, and the EU project faces its deadline to build a digital brain. So have we begun to unwrap the secrets of the human brain? Or have we spent a decade and billions of dollars chasing a vision that remains as elusive as ever? Read the full story.

—Emily Mullin

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)

+ I hope Fat Albert the polar bear is doing well.
+ Classic novels can’t please everyone—even if they’re classics for a reason.
+ Turns out we may have been mishearing Neil Armstrong’s famous first words as he set foot on the moon.
+ Hang onto those DVDs, you never know when Netflix is going to fail you. 📀