What's Wrong with Facebook
By
pusahma2008
—
Monday, May 20, 2019
—
What's Wrong With Facebook
What's Wrong With Facebook
The crucial imperfection that created this blackout to be so severe was an unfortunate handling of a mistake problem. An automated system for validating setup values wound up triggering a lot more damages than it fixed.
The intent of the computerized system is to look for arrangement values that are invalid in the cache and also change them with updated values from the consistent shop. This functions well for a transient issue with the cache, however it does not work when the persistent shop is invalid.
Today we made a change to the consistent copy of a configuration worth that was taken invalid. This meant that every single client saw the void worth and also attempted to fix it. Because the repair includes making a query to a collection of databases, that collection was quickly bewildered by thousands of countless queries a second.
To make matters worse, each time a client got an error trying to query among the data sources it translated it as a void worth, and deleted the corresponding cache secret. This implied that also after the initial problem had actually been taken care of, the stream of queries proceeded. As long as the databases fell short to service several of the demands, they were creating even more requests to themselves. We had actually gone into a feedback loop that really did not allow the data sources to recoup.
The method to stop the comments cycle was quite agonizing - we had to stop all website traffic to this database collection, which suggested shutting off the site. As soon as the databases had recuperated and also the source had actually been fixed, we slowly allowed even more individuals back onto the website.
This obtained the site back up and running today, as well as in the meantime we've switched off the system that tries to remedy configuration worths. We're exploring brand-new designs for this configuration system adhering to design patterns of various other systems at Facebook that deal more beautifully with responses loops and transient spikes.
We say sorry once more for the website blackout, and also we want you to know that we take the performance as well as integrity of Facebook really seriously.