Sunday, April 08, 2012

When Clouds Go Bump

There certainly has been a lot of discussion at the CIO level about Cloud Computing. A recent CIO I spoke with was looking at the cloud to help him address his challenges in staffing and running his data center. He and I had deep discussions about his in house alternatives to the cloud. Interestingly while we were having these discussions one of the biggest clouds went bump.

On February 29, 2012 Microsoft's Azure product had a global service failure. Microsoft's Azure Service Dashboard was overwhelmed and pretty much unavailable for hours on end. When it was accessible here is one of the updates approximately 8 hours into the outage:
The restoration steps to mitigate the issue are still underway. This incident impacts Access Control 2.0, Marketplace, Service Bus and the Access Control & Caching Portal in the same regions where Windows Azure Compute is impacted. As a result affected customers may experience a loss of application functionality. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
This is not a Microsoft bashing. In April, 2011 Amazon also suffered a broad failure.

What I want to highlight is that cloud providers are not immune from service failures. They are likely capable of providing more redundant and resilient services than many organizations can provide.

My question is as a CIO are you comfortable with this level of opacity to service failures? Are you willing to answer your users and executives with one-way information flow from a (sometimes available) web page? Would a 10 day service credit make your CEO happy?

No comments: