Home Security Facebook outage a prime example of insider threat by machine

by Christopher Burgess

Contributing Writer

Facebook outage a prime example of insider threat by machine

Opinion

Nov 04, 20215 mins

Threat and Vulnerability Management

A buggy automated audit tool and human error took Facebook offline for six hours. Key lesson for CISOs: Look for single points of failure and hedge your bets.

please stand by problem technical difficulties tv mistake test screen by filo getty

Credit: Filo / Getty

The longest six hours in Facebook’s history took place on October 4, 2021, as Facebook and its sister properties went dark. The social network suffered a catastrophic outage. The only silver lining to the outage, if there is one, is that the outage wasn’t caused by malicious actors. Rather, it was a self-inflicted wound caused by Facebook’s own network engineering team.

According to the first engineering blog post from Facebook on October 4, they fingered “configuration changes on the backbone routers that coordinated network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”

They followed up their blog post on October 5 with more details: “A command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all connections in our backbone network, disconnecting Facebook data centers globally.” The blog explained how their systems have fail-safe processes in place to prevent this type of mistake, but “a bug in that audit tool prevented it from properly stopping the command.”

Yes, yet another instance where the machines turned out to be the insider that caused the havoc.

Impact of a machine-based insider event

A Domain Name System (DNS) error caused their BGP (border gateway protocol) messages to essential go blank. Neither Facebook (Instagram/WhatsApp), nor the internet could find them. When the audit tool failed, the platforms themselves were unreachable. The company wasn’t able to operate remotely, so all work had to be managed locally. Imagine the gyrations that were necessary to manually bypass all the technological barriers to entry that were in place and were now defaulting to their error status.

Additionally, it was widely reported that the same internal infrastructure supported various internet of things (IoT) devices and services within the company itself were affected, to include access control, company email, and employee online workspaces – all are managed in house.

The impact went beyond Facebook’s 3.5 billion users eager to share their photos, opinions, and recipes. Third-party entities that tied their authentication process to Facebook had clients/customers/employees unable to access their accounts. Individual users who opted to use their Facebook account as their log-in were also found twiddling their thumbs waiting for the outage to end, as access to their desired domains was being blocked due to the unavailability of the authentication processes.

Lessons for CISOs from the Facebook outage

Is this an instance of technical decisions being made by non-technical leaders? Cary Conrad, chief development officer at SilverSky, comments how the self-inflicted outage is “emblematic of a broader leadership issue in the tech world.” He observes how he has seen for more than 20 years how “Good management trumps good technology every time, yet due to the ever-changing threatscape of the tech industry, inexperienced leadership is oftentimes relied upon for the sake of expediency.” He continues, how within the world of cybersecurity, “The Peter Principle is in full effect. People progress to their level of incompetence, meaning a lot of people in leadership within cyber have risen to a level that is difficult for them to execute and often lack formal technical training. As a CISO, there is a need to configure, identify, and negotiate the cost of protecting an organization, and without the adequate experience or a disciplined approach, this mission is executed poorly.”

While the knee-jerk reaction may be to punish the engineer who gave the update order, that would be misdirected ire. The real culprit, in this instance, is Facebook’s own architecture. It allowed their network to fail the most basic of network tenets: Do not allow for a single point of failure.

Facebook’s infrastructure collapsed when the automated audit process failed due to an undetected (or known but not yet mitigated) bug.

Tom Krazit and Joe Williams hit the nail on the head with their summation published in protocol of the three learning opportunities for CISOs which come out of Facebook’s outage:

Plan for the worst. Enterprises need a contingency plan for the complete loss of their computing resources or network connection, not just the loss of a data center or cloud region.
Hedge your bets. It’s extremely unlikely that the entire internet will go down at the same time; hedging at least a few bets across multiple service providers could be worth the effort.
Check your priorities. There’s no way to run an operation the size of Facebook without a serious amount of automation, which means code-auditing tools like the one that failed to stop this outage need extra attention.

October 4 was a bad day for Facebook, and a tweet from Jonathan Zittrain, Harvard Law professor at the School of Engineering and Applied Science, wryly summarized it: Facebook basically locked its keys in the car.

by Christopher Burgess

Contributing Writer

Christopher Burgess is a writer, speaker and commentator on security issues. He is a former senior security advisor to Cisco, and has also been a CEO/COO with various startups in the data and security spaces. He served 30+ years within the CIA which awarded him the Distinguished Career Intelligence Medal upon his retirement. Cisco gave him a stetson and a bottle of single-barrel Jack upon his retirement. Christopher co-authored the book, “Secrets Stolen, Fortunes Lost, Preventing Intellectual Property Theft and Economic Espionage in the 21st Century”. He also founded the non-profit, Senior Online Safety.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Facebook outage a prime example of insider threat by machine

A buggy automated audit tool and human error took Facebook offline for six hours. Key lesson for CISOs: Look for single points of failure and hedge your bets.

Impact of a machine-based insider event

Lessons for CISOs from the Facebook outage

More from this author

Close the barn door now! Avoid the risk of not monitoring retained access before it’s a problem

Why global warnings about China’s cyber-espionage matter to CISOs

Is privacy being traded away in the name of innovation and security?

Canada wakes up to China, Russia, Iran threat to intellectual property

Most popular authors

Show me more

Malware explained: How to prevent, detect and recover from it

LayerX Security Raises $26M for its Browser Security Platform, Enabling Employees to Work Securely from Any Browser, Anywhere

Iranian hackers harvest credentials through advanced social engineering campaigns

CSO Executive Sessions: The personality of cybersecurity leaders

CSO Executive Sessions: Geopolitical tensions in the South China Sea - why the private sector should care

CSO Executive Sessions: 2024 International Women's Day special

CSO Executive Sessions: The personality of cybersecurity leaders

CSO Executive Sessions: Geopolitical tensions in the South China Sea - why the private sector should care

CSO Executive Sessions: 2024 International Women's Day special

Facebook outage a prime example of insider threat by machine

A buggy automated audit tool and human error took Facebook offline for six hours. Key lesson for CISOs: Look for single points of failure and hedge your bets.

Impact of a machine-based insider event

Lessons for CISOs from the Facebook outage

Related content

Most interesting products to see at RSAC 2024

AI governance and cybersecurity certifications: Are they worth it?

CISA, FBI urge developers to patch path traversal bugs before shipping

Microsoft continues to add, shuffle security execs in the wake of security incidents

From our editors straight to your inbox

More from this author

Close the barn door now! Avoid the risk of not monitoring retained access before it’s a problem

Why global warnings about China’s cyber-espionage matter to CISOs

Is privacy being traded away in the name of innovation and security?

Canada wakes up to China, Russia, Iran threat to intellectual property

Most popular authors

Show me more

Malware explained: How to prevent, detect and recover from it

LayerX Security Raises $26M for its Browser Security Platform, Enabling Employees to Work Securely from Any Browser, Anywhere

Iranian hackers harvest credentials through advanced social engineering campaigns

CSO Executive Sessions: The personality of cybersecurity leaders

CSO Executive Sessions: Geopolitical tensions in the South China Sea - why the private sector should care

CSO Executive Sessions: 2024 International Women's Day special

CSO Executive Sessions: The personality of cybersecurity leaders

CSO Executive Sessions: Geopolitical tensions in the South China Sea - why the private sector should care

CSO Executive Sessions: 2024 International Women's Day special