Coming Back Up is More Painful than Going Down

By on October 24, 2012

Lessons learned from the Amazon Web Services outage: This is a good post about what we should have learned from the recent AWS outage.  In particular, these two lessons are the most important.

The stress of failure will trigger a cascade of other failures. […] What started as a small issue affecting one Northern Virginia data center quickly spread, causing a chain reaction and outage that disrupted much of the Internet for several hours.

[…] Spikes matter. When a cloud fails, hundreds of customers are impacted. As they try to recover, they will be stressing the cloud provider’s infrastructure with a peak load that is guaranteed to cause even more problems.

That second one is important.

Reddit recently replaced their “we’re down” screen, but the old one had a picture of the Reddit alien lying on the ground unconscious with another alien standing over it.  The standing alien had a hammer and was about to hit the prone alien with it.  The hammer was labeled “F5.”

Very few things ever come back cleanly.



Comments are closed. If you have something you really want to say, email and we'll get it added for you.