Thor Olavsrud
Senior Writer

10 famous AI disasters

Feature
Apr 17, 202413 mins
Artificial IntelligenceGenerative AIMachine Learning

Insights from data and machine learning algorithms can be invaluable, but mistakes can be irreversible. These recent high-profile AI blunders illustrate what can go wrong.

robot wars red robot defeated on old wooden floor
Credit: charles taylor / Shutterstock

In 2017, The Economist declared that data, rather than oil, had become the world’s most valuable resource. The refrain has been repeated ever since. Organizations across every industry have been investing, and continue to heavily invest, in data and analytics. But like oil, data and analytics have their dark side.

According to CIO’s State of the CIO 2023 report, 26% of IT leaders say machine learning (ML) and AI will drive the most IT investment. And while actions driven by ML algorithms can give organizations a competitive advantage, mistakes can be costly in terms of reputation, revenue, or even lives.

Understanding your data and what it’s telling you is important, but it’s equally vital to understand your tools, know your data, and keep your organization’s values firmly in mind.

Here are a handful of high-profile AI blunders from the past decade to illustrate what can go wrong.

Air Canada pays damages for chatbot lies

In February 2024, Air Canada was ordered to pay damages to a passenger after its virtual assistant gave him incorrect information at a particularly difficult time.

Jake Moffatt consulted Air Canada’s virtual assistant about bereavement fares following the death of his grandmother in November 2023. The chatbot told him he could buy a regular price ticket from Vancouver to Toronto and apply for a bereavement discount within 90 days of purchase. Following that advice, Moffatt purchased a one-way CA$794.98 ticket to Toronto and a CA$845.38 return flight to Vancouver.

But when Moffatt submitted his refund claim, the airline turned him down, saying that bereavement fares can’t be claimed after tickets have been purchased.

Moffatt took Air Canada to a tribunal in Canada, claiming the airline was negligent and misrepresented information via its virtual assistant. According to tribunal member Christopher Rivers, Air Canada argued it can’t be held liable for the information provided by its chatbot.

Rivers denied that argument, saying the airline didn’t take “reasonable care to ensure its chatbot was accurate,” So he ordered the airline to pay Moffatt CA$812.02, including CA$650.88 in damages.

Sports Illustrated may have published AI-generated writers

In November 2023, online magazine Futurism, said Sports Illustrated was publishing articles by AI-generated writers.

Futurism cited anonymous sources were involved to create content, and said the storied sports magazine published “a lot” of fake authors, with some articles under those fake authors’ bylines generated by AI as well.

The online magazine found the author headshots in question listed on a site that sells AI-generated portraits. Futurism then reached out to The Arena Group, publisher of Sports Illustrated, and in a statement, Arena Group said the articles in question were licensed content from a third party, AdVon Commerce.

“We continually monitor our partners and were in the midst of a review when these allegations were raised,” Arena Group said in the statement provided to Futurism. “AdVon has assured us that all of the articles in question were written and edited by humans.”

The statement also said that AdVon writers used pen names or pseudonyms in certain articles, noting that Arena Group doesn’t condone those actions. Arena Group subsequently removed the articles in question from the Sports Illustrated website.

Responding to the Futurism article, the Sports Illustrated Union posted a statement that it was horrified by the allegations and demanded answers and transparency from Arena Group management.

“If true, these practices violate everything we believe in about journalism,” the SI Union said in its statement. “We deplore being associated with something so disrespectful to our readers.”

Gannett AI flubs high school sports articles

In August 2023, newspaper chain Gannett announced it would pause the use of an AI tool called LedeAI after several dispatches written by the AI went viral for being repetitive, poorly written, and lacking key details.

CNN pointed to one example, preserved by the Internet Archive’s Wayback Machine, which opened with, “The Worthington Christian [[WINNING_TEAM_MASCOT]] defeated the Westerville North [[LOSING_TEAM_MASCOT]] 2-1 in an Ohio boys soccer game on Saturday.”

CNN found similar stories written by LedeAI in other local Gannett papers, including the Louisville Courier Journal, AZ Central, Florida Today, and the Milwaukee Journal Sentinel.

After the stories were roundly mocked on social media, Gannett opted to pause the use of LedeAI in all local markets that had been using the service.

In a statement to CNN, LedeAI CEO Jay Allred expressed regret and promised an around-the-clock effort to correct the problems.

iTutor Group’s recruiting AI rejects applicants due to age

In August 2023, tutoring company iTutor Group agreed to pay $365,000 to settle a suit brought by the US Equal Employment Opportunity Commission (EEOC). The federal agency said the company, which provides remote tutoring services to students in China, used AI-powered recruiting software that automatically rejected female applicants ages 55 and older, and male applicants ages 60 and older.

The EEOC said more than 200 qualified applicants were automatically rejected by the software.

“Age discrimination is unjust and unlawful,” EEOC Chair Charlotte A. Burrows said in a statement. “Even when technology automates the discrimination, the employer is still responsible.”

iTutor Group denied any wrongdoing but did decide to settle the suit. As part of the settlement and consent decree, it agreed to adopt new anti-discrimination policies.

ChatGPT hallucinates court cases

Advances made in 2023 by large language models (LLMs) have stoked widespread interest in the transformative potential of generative AI across nearly every industry. OpenAI’s ChatGPT has been at the center of this surge in interest, foreshadowing how gen AI holds the power to disrupt the nature of work in nearly every corner of business.

But the technology still has a long way to go before it can reliably take over most business processes, as attorney Steven A. Schwartz learned when he found himself in hot water with US District Judge P. Kevin Castel in 2023 after using ChatGPT to research precedents in a suit against Colombian airline Avianca.

Schwartz, an attorney with Levidow, Levidow & Oberman, used the OpenAI generative AI chatbot to find prior cases to support a case filed by Avianca employee Roberto Mata for injuries he sustained in 2019. The problem? At least six of the cases submitted in the brief didn’t exist. In a document filed in May, Judge Castel noted the cases submitted by Schwartz included false names and docket numbers, along with bogus internal citations and quotes. Schwartz’s partner, Peter LoDuca, was Mata’s lawyer of record and signed the brief, putting himself in jeopardy as well.

In an affidavit, Schwartz told the court it was the first time he had used ChatGPT as a legal research source and was “unaware of the possibility that its content could be false.” He admitted he hadn’t confirmed the sources provided by the AI chatbot. He also said he “greatly regrets having utilized generative artificial intelligence to supplement the legal research performed herein, and will never do so in the future without absolute verification of its authenticity.”

In June 2023, Judge Castel imposed a $5,000 fine on Schwartz and LoDuca. In a separate ruling in June, Judge Castel dismissed Mata’s lawsuit against Avianca.

AI algorithms identify everything but COVID-19

Since the COVID-19 pandemic began in 2020, numerous organizations have sought to apply ML algorithms to help hospitals diagnose or triage patients faster. But according to the UK’s Turing Institute, a national center for data science and AI, the predictive tools made little to no difference.

MIT Technology Review has chronicled a number of failures, most of which stem from errors in the way the tools were trained or tested. The use of mislabeled data, or data from unknown sources, was a common culprit.

Derek Driggs, an ML researcher at the University of Cambridge, together with his colleagues, published a paper in Nature Machine Intelligence that explored the use of deep learning models for diagnosing the virus. The paper determined the technique wasn’t fit for clinical use. For example, Driggs’ group found that their own model was flawed because it was trained on a data set that included scans of patients that were lying down while scanned, and patients that were standing up. The patients who were lying down were much more likely to be seriously ill, so the algorithm learned to identify COVID risk based on the position of the person in the scan.

A similar example includes an algorithm trained with a data set that included scans of the chests of healthy children. The algorithm learned to identify children, not high-risk patients.

Zillow wrote down millions, slashed workforce due to algorithmic home-buying disaster

In November 2021, online real estate marketplace Zillow told shareholders it would wind down its Zillow Offers operations and cut 25% of the company’s workforce — about 2,000 employees — over the next several quarters. The home-flipping unit’s woes were the result of the error rate in the ML algorithm it used to predict home prices.

Zillow Offers was a program through which the company made cash offers on properties based on a “Zestimate” of home values derived from a ML algorithm. The idea was to renovate the properties and flip them quickly. But a Zillow spokesperson told CNN the algorithm had a median error rate of 1.9%, and could be as high as 6.9% for off-market homes.

CNN reported that Zillow bought 27,000 homes through Zillow Offers since its launch in April 2018, but sold only 17,000 through the end of September 2021. Black swan events like the COVID-19 pandemic and a home renovation labor shortage contributed to the algorithm’s accuracy troubles.

Zillow said the algorithm had led it to unintentionally purchase homes at higher prices than its current estimates of future selling prices, resulting in a $304 million inventory write-down in Q3 2021.

In a conference call with investors following the announcement, Zillow co-founder and CEO Rich Barton said it might be possible to tweak the algorithm, but ultimately it was too risky.

Healthcare algorithm failed to flag Black patients

In 2019, a study published in Science revealed that a healthcare prediction algorithm, used by hospitals and insurance companies throughout the US to identify patients in need of “high-risk care management” programs, was far less likely to flag Black patients.

High-risk care management programs provide trained nursing staff and primary-care monitoring to chronically ill patients in an effort to prevent serious complications. But the algorithm was much more likely to recommend white patients for these programs than Black patients.

The study found that the algorithm used healthcare spending as a proxy for determining an individual’s healthcare need. But according to Scientific American, the healthcare costs of sicker Black patients were on par with the costs of healthier white people, which meant they received lower risk scores even when their need was greater.

The study’s researchers suggested that a few factors may have contributed. First, people of color are more likely to have lower incomes, which, even when insured, may make them less likely to access medical care. Implicit bias may also cause people of color to receive lower-quality care.

While the study didn’t name the algorithm or the developer, the researchers told Scientific American they were working with the developer to address the situation.

Dataset trained Microsoft chatbot to spew racist tweets

In March 2016, Microsoft learned that using Twitter interactions as training data for ML algorithms can have dismaying results.

Microsoft released Tay, an AI chatbot, on the social media platform, and the company described it as an experiment in “conversational understanding.” The idea was that the chatbot would assume the persona of a teenage girl and interact with individuals via Twitter using a combination of ML and natural language processing. Microsoft seeded it with anonymized public data and some material pre-written by comedians, then set it loose to learn and evolve from its interactions on the social network.

Within 16 hours, the chatbot posted more than 95,000 tweets, and those tweets rapidly turned overtly racist, misogynist, and anti-Semitic. Microsoft quickly suspended the service for adjustments and ultimately pulled the plug.

“We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay,” Peter Lee, corporate VP, Microsoft Research & Incubations (then corporate VP of Microsoft Healthcare), wrote in a post on Microsoft’s official blog following the incident.

Lee noted that Tay’s predecessor, Xiaoice, released by Microsoft in China in 2014, had successfully conducted conversations with more than 40 million people in the two years prior to Tay’s release. What Microsoft didn’t take into account was that a group of Twitter users would immediately begin tweeting racist and misogynist comments to Tay. The bot quickly learned from that material and incorporated it into its own tweets.

“Although we had prepared for many types of abuses of the system, we had made a critical oversight for this specific attack. As a result, Tay tweeted wildly inappropriate and reprehensible words and images,” Lee wrote.

Like many large companies, Amazon is hungry for tools that can help its HR function screen applications for the best candidates. In 2014, Amazon started working on AI-powered recruiting software to do just that. There was only one problem: The system vastly preferred male candidates. In 2018, Reuters broke the news that Amazon had scrapped the project.

Amazon’s system gave candidates star ratings from 1 to 5. But the ML models at the heart of the system were trained on 10 years’ worth of résumés submitted to Amazon — most of them from men. As a result of that training data, the system started penalizing phrases in résumés that included the word “women’s” and even downgraded candidates from all-women colleges.

At the time, Amazon said the tool was never used by Amazon recruiters to evaluate candidates. The company tried to edit the tool to make it neutral, but ultimately decided it couldn’t guarantee it wouldn’t learn some other discriminatory way of sorting candidates and ended the project.