Sorry something Went Wrong Facebook

Sorry Something Went Wrong Facebook - Early today Facebook was down or unreachable for many of you for about 2.5 hrs. This is the most awful interruption we've had in over 4 years, and also we wanted to firstly apologize for it. We also wished to give much more technological detail on what happened as well as share one large lesson found out.

What's Wrong With Facebook

Sorry Something Went Wrong Facebook


The key imperfection that caused this interruption to be so serious was an unfavorable handling of a mistake problem. A computerized system for validating setup values wound up creating a lot more damages than it dealt with.

The intent of the automated system is to check for configuration worths that are invalid in the cache and change them with upgraded worths from the persistent shop. This works well for a transient problem with the cache, yet it doesn't work when the relentless shop is invalid.

Today we made a modification to the persistent copy of a configuration worth that was interpreted as invalid. This suggested that every customer saw the void worth and also tried to repair it. Because the repair entails making a question to a cluster of data sources, that cluster was swiftly bewildered by numerous hundreds of inquiries a second.

To make issues worse, each time a customer obtained a mistake trying to inquire among the databases it interpreted it as a void value, and removed the corresponding cache key. This suggested that even after the initial problem had actually been fixed, the stream of questions continued. As long as the databases fell short to service a few of the demands, they were triggering even more demands to themselves. We had entered a comments loop that really did not permit the databases to recuperate.

The method to stop the responses cycle was fairly excruciating - we had to stop all website traffic to this database collection, which meant shutting off the website. Once the databases had actually recouped and also the source had been repaired, we slowly enabled even more people back onto the website.

This obtained the site back up and running today, and for now we've turned off the system that tries to deal with setup values. We're discovering brand-new layouts for this setup system complying with design patterns of various other systems at Facebook that deal more with dignity with comments loopholes and short-term spikes.

We say sorry again for the site failure, as well as we want you to know that we take the performance and integrity of Facebook really seriously.