Friday, August 10, 2007

365Main - an example of great disaster recovery communications

Even if you weren't effected by the 365Main power outage, you should read the status update posted by their president.

There are a few things to note here:

  • The entire event is broken down with technical detail.
  • Details of problem solving, maintenance, and testing are all available and relatively transparent - thus providing customers with detail about the event.
  • The tone is professional and communicates issues and events clearly.
  • Customers are reminded of what recompense their contract provides for them.
  • There is a clearly explained troubleshooting process.
  • There is a clearly explained plan to prevent future issues.
  • The data is being made available to other data centers to help prevent similar issues elsewhere - thus giving back to the community.
Also worth noting is that the status has been regularly updated, and that each update includes current information and future steps.

If you are ever in a recovery situation, this is a great example of after action communication to follow.

On the technical side - remember that simply having backup power may not be enough - many of the customers who had power interrupted would have continued to function if they had dedicated UPS units - but without power to their network uplinks, they might not have been able to see the outside world, even if they had power to the machines themselves.

No comments: