Max Krebs

A Sidebar to Gloat

On February 28th, 2017, in a perfect representation of how this year is going, Amazon’s AWS S3 service fell over. Amazon described it as ‘Increased Error Rates’, although to me, that seemed to be understating the situation a bit.

S3 is Amazon’s cloud storage solution that most of the Internet uses to serve files and assets like images or PDFs. What happened was that some issue caused connections to S3’s US-east region to be incapable of sending or receiving connections. I would have though this meant maybe some images on some sites would be broken, but this outage illustrated the danger of putting all your technology eggs in one basket. One outage in one isolated service is an inconvenience, but the cascade as systems that are inextricably linked all fail is a major vulnerability in our web infrastructure. This reminded me very much of the mass DNS outage caused by a DDoS attack on one DNS provider.

As you know if you’ve been reading my series about migrating my hosting from Heroku to Linode, just the week before the S3 outage I moved this site, and (just the day before the outage) my podcast discovery site onto a Linux VPS. This comes as a giant relief to me because, despite the fact that this shouldn’t be the case, Heroku apps are down because of the S3 problems. My question is, why is a hosting provider’s app containers going down because a file storage server is down?

The bottom line is: if I hadn’t moved my hosting infrastructure from Heroku to a Linux VPS on Linode, all of my websites would be down right now and there would be nothing I could do about it.

I can’t gloat too much. For both sites I use S3 to serve images, I just happened to pick the Oregon region, which isn’t experiencing any outages. But now I will be looking for a better solution to hosts images that isn’t so prone to taking down the entire web.