Can Amazon handle its fast-growing cloud?
Hurricane-like storms knocked an Amazon data center in Ashburn, Virginia, offline last night, and a chunk of the Internet felt it. The six-hour incident temporarily cut off a number of popular internet services, including Netflix, Pinterest, Heroku, and Instagram.
The outage was the second for this particular Amazon data center in the past month. Itâs bad news for a cloud computing platform thatâs sold as a more reliable alternative to traditional data centers.
In theory, big outages like this arenât supposed to happen. Amazon is supposed to keep the data centers up and running â" something itâs has become very good at â" and customers like Netflix, freed from that drudgery, are supposed to be free to cook up compelling new web application like video streaming.
In reality, though, Amazon data centers have outages all the time. In fact, Amazon tells its customers to plan for this to happen, and to be ready to roll over to a new data center whenever thereâs an outage.
Thatâs what was supposed to happen at Netflix Friday night. But it didnât work out that way. According to Twitter messages from Netflix Director of Cloud Architecture Adrian Cockcroft and Instagram Engineer Rick Branson, it looks like an Amazon Elastic Load Balancing service, designed to spread Netflixâs processing loads across data centers, failed during the outage. Without that ELB service working properly, the Netflix and Pintrest services hosted by Amazon crashed.
Fridayâs outage wasnât nearly as severe as the one that took out Amazon in April 2011. Then, a botched network update rolled across several data centers, causing widespread outages on the Amazon cloud.
âWe lost a much bigger proportion of just one [Amazon data center] than the last power outage, and the ELBs didnât route around it,â said Netflixâs Cockroft, via Twitter.
So on Saturday, there are two big questions that need to be answered. First, why did Amazonâs Ashburn data center fail? A storm shouldnât have taken out Amazonâs backup generators. Second, Why were companies like Netflix so drastically affected by a single data center outage?
So far, Amazon isnât saying a lot. âSevere thunderstorms caused us to lose primary and backup generator power to an Availability Zone in our east region overnight,â said Amazon spokeswoman Tera Randall on Saturday morning. âWe have restored service to most of our impacted customers and continue to work to restore service for our remaining impacted customers.â
The powerful storms cut power to nearly a million customers, said Dominion Virginia Power. Storm winds hit 80 miles per hour, and killed at least six people in Virginia, according to reports.
At Netflix, services were offline for about three hours â" between 8 pm and 11 pm pacific â" according to company spokesman Joris Evers. âWeâre actively working to analyze the cause and understand what happened,â he said Saturday.
Netflix doesnât use Amazon to actually stream its video, so customers who were in the middle of watching movies wouldnât have been interrupted. But Amazon powers virtually all of the back end services on Netflix.com, so the outage made connecting and starting up new movies impossible for customers.
The full explanation of what actually went wrong is sure to be complex.
When asked via Twitter if he blamed Amazon or should have been better prepared for this outage, Instagramâs Branson said, âlol go troll someone else. I work for a living.â
Amazon is promising to tell more about what exactly happened in the week ahead, and so is Netflix. Cloud-watchers are waiting.
No comments:
Post a Comment