What is Wrong with Facebook
By
Ega Wahyudi
—
Friday, March 20, 2020
—
What's Wrong With Facebook
What Is Wrong With Facebook
The vital flaw that created this failure to be so severe was an unfortunate handling of a mistake problem. An automated system for verifying setup values ended up causing much more damages than it repaired.
The intent of the computerized system is to check for setup worths that are void in the cache and replace them with upgraded values from the consistent store. This functions well for a transient problem with the cache, but it doesn't function when the consistent store is invalid.
Today we made a modification to the consistent duplicate of a setup worth that was interpreted as void. This implied that every client saw the invalid value as well as attempted to repair it. Due to the fact that the fix includes making an inquiry to a collection of data sources, that cluster was quickly bewildered by numerous thousands of inquiries a 2nd.
To make matters worse, every time a customer obtained a mistake attempting to query among the data sources it interpreted it as an invalid value, and also removed the corresponding cache key. This indicated that also after the initial problem had been fixed, the stream of queries continued. As long as the data sources failed to service some of the demands, they were triggering even more requests to themselves. We had actually gotten in a comments loophole that really did not allow the data sources to recover.
The way to stop the feedback cycle was fairly uncomfortable - we had to quit all traffic to this data source cluster, which indicated shutting off the website. When the databases had actually recuperated and also the source had actually been dealt with, we slowly allowed even more individuals back onto the site.
This got the site back up and also running today, and in the meantime we have actually turned off the system that tries to fix arrangement values. We're discovering brand-new layouts for this configuration system complying with style patterns of other systems at Facebook that deal even more beautifully with responses loops as well as short-term spikes.
We apologize once again for the site interruption, as well as we desire you to recognize that we take the performance as well as dependability of Facebook really seriously.