Tuesday, October 12, 2021

What Would Happen If Amazon Web Services Eastern Region Went Down? (TL;dr: it would be very bad)

Mamas, don't let your babies grow up to be risk managers. 
They'll worry themselves sick and sometimes wish they were a cowboy.

From ongoing

Worst Case

Suppose you’re running your organization’s crucial apps in the cloud. Specifically, suppose you’re running them them on AWS, and in particular in the “us-east-1” region? Could us-east-1 go away? What might you do about it? Let’s catastrophize!

Acks & disclaimers · First, thanks to Corey Quinn for this Twitter thread, which got me thinking. ¶

Corey Quinn tweet on us-east-1

Second, while I worked for AWS for 5½ years, I’ve never been near a data center, nor do I have any inside information about the buildings, servers, or networking. On the other hand, I do have a decent understanding of AWS culture and capabilities in software engineering and operations. Bear those facts in mind as you read this.

Finally, since this blog fragment concerns itself entirely with catastrophic scenarios, I’ll try to be cheerful about it.

[Those of you who know what us-east-1 is can skip over the next section to the first entertaining disaster.]

“us-east-1”? · AWS means “Amazon Web Services”, Amazon’s insanely huge ($60B/year revenue) and profitable (~30% margin) collection of cloud-computing services. Basically, AWS will rent you computers and databases and the use of many other software services. So more or less everything your IT department owns can be rented by the hour (or second) rather than installed in your own data center. ¶

If you’re using AWS, you have to pick one (or more) of its (24, as I write) “regions” to host your systems. They have boring names like “us-west-2” (Portland) and “ap-northeast-1” (Tokyo).

“us-east-1” (N. Virginia) is generally thought to be the biggest region, by a huge margin. There have been estimates that 30% of all Internet traffic flows through it. Here’s AWS’s official write-up and here’s a nice Atlantic story by a person who drove around Northern Virginia looking for the actual buildings.

Before we leave the subject, I should say that each AWS region is divided into multiple “availability zones” (AZ’s), data centers that are independently operated and geographically separated, so to really lose a whole region, you”d have to take all of them out.

If us-east-1 went off the air, it would be Really Bad. How could that happen?

Terrestrial disaster · This is the first one anybody thinks of. ¶

Suppose a big late-summer hurricane somehow misses Florida and Texas, cruises north offshore picking up energy from an anomalously-warm western Atlantic, turns left just south of DC, and savages anywhere that’s easy driving distance from Dulles airport. We’re talking about inches of rain in a few hours so every waterway floods; also, high winds and lightning are playing hell with the electrical and network infrastructure.

The other obvious candidate would be an earthquake, which can ravage infrastructure to a degree unequaled by any other flavor of natural catastrophe. Among other thing, the Potomac bridges and lots of freeway overpasses would be rubble, so your ability to bring help in would be severely reduced.

If you’re the unlucky proprietor of systems hosted at us-east-1, they’d be off the air, and while AWS would probably arrange to answer your distress call, there’s really not much that could be done. How would your business do if it were off the air for, uh, nobody really knows how long?

How much should you worry? · This one worries me less than a lot of the other scenarios here. First off, the hurricane scenario is so utterly predictable that I bet anyone with a significant data-center presence in the region has been planning and wargaming around this one for at least a decade.....

.....MUCH MORE