Something Went Wrong Facebook

Something Went Wrong Facebook - Early today Facebook was down or inaccessible for much of you for approximately 2.5 hrs. This is the worst blackout we've had in over four years, and also we wished to firstly apologize for it. We also wanted to give much more technological detail on what happened and also share one large lesson discovered.

What's Wrong With Facebook

Something Went Wrong Facebook


The key problem that caused this interruption to be so extreme was an unfavorable handling of a mistake problem. An automatic system for confirming configuration values ended up causing a lot more damages than it repaired.

The intent of the automated system is to look for arrangement values that are invalid in the cache and also change them with upgraded worths from the persistent shop. This functions well for a transient trouble with the cache, however it does not work when the persistent shop is invalid.

Today we made a modification to the relentless copy of a setup value that was interpreted as void. This suggested that every customer saw the invalid value as well as tried to fix it. Because the solution entails making a query to a collection of data sources, that cluster was swiftly overwhelmed by thousands of thousands of inquiries a second.

To make issues worse, whenever a client got a mistake attempting to query one of the databases it translated it as a void value, as well as erased the corresponding cache secret. This implied that even after the initial problem had been repaired, the stream of inquiries proceeded. As long as the data sources failed to service some of the demands, they were causing a lot more requests to themselves. We had actually entered a feedback loop that really did not permit the databases to recoup.

The method to stop the comments cycle was fairly agonizing - we had to stop all web traffic to this database collection, which suggested switching off the site. As soon as the data sources had actually recovered as well as the source had been taken care of, we slowly permitted even more individuals back onto the site.

This got the site back up as well as running today, and also in the meantime we have actually turned off the system that attempts to fix configuration values. We're discovering new styles for this configuration system adhering to layout patterns of other systems at Facebook that deal even more gracefully with feedback loopholes and also short-term spikes.

We apologize once again for the site interruption, as well as we desire you to recognize that we take the efficiency as well as reliability of Facebook very seriously.