The advent of cloud services offered through data centers raises an important question: "How reliable are cloud services?" Cloud-based data centers can be configured many ways, from non-redundant to fully redundant operations. The non-redundant is the least reliable configuration, while the fully redundant is expected to be the most reliable but also the most expensive to implement and operate. Are you sure that cloud data center operators are providing useful information about their sustainability?
It has been claimed that at least one company promoted Tier III classification for its data center uptime, without having the Uptime Institute certify that this was true. The standards for Tier I through IV as specified by the Institute have unfortunately been used as generic terms, which has led many enterprises to assume that a particular data center satisfies the Institute's requirements, when in fact the facilities have not been certified.
What Reliability Numbers Do Not Include
Even with the best infrastructure design, events will occur that reduce the reliability of a data center, such as:
* Power failure
* Operator error
* Malicious attacks
* Fire
* Flood
* Storms/hurricanes
You need to understand that in most cases, service interruptions caused by scenarios such as above are not factored into reliability computations by the data center operators. While it is reasonable that these unpredictable events are left out of computations, this will produce higher reliability numbers than may be the real experience.
Tier Configurations
An Uptime Institute white paper defines the criteria and configurations for each of the Tier I through IV structures. None of the descriptions provide Mean Time Between Failures (MTBF) and Mean Time to Repair (MTTR) numbers, only how well the configuration is designed. This means that while a Tier IV configuration is more reliable than a Tier I configuration, there are no metrics to compare just how much better.
Tier I is the most basic and least reliable data center infrastructure. It consists of a non-redundant computer/server configuration served by a single non-redundant network.
Tier II has redundant-capacity computer/servers but is still restricted to a single non-redundant network. It is designed to be more reliable than a Tier I configuration, but is still susceptible to disruption stemming from both planned and unplanned events.
Tier III configurations are the most commonly mentioned, as a majority of data centers achieve this certification. Tier III consists of redundant equipment with more than one network serving the equipment. Because of its redundant design, maintenance can more easily be performed without disruption or downtime, as the secondary network serves as a backup to keep operations running while updates are being rolled out on the primary network.
Tier IV configurations are designed to be fault tolerant and are thus the most expensive to implement. They have multiple, independent, physically isolated equipment connected by multiple networks all working simultaneously.
Even though Tier III data centers are designed to be more reliable than Tier II in theory, a redundant Tier III configuration of unreliable components may produce a lower MTBF (i.e., more frequent failures) than a Tier II with highly reliable components.
Who Rates the Configurations?
The Uptime Institute employs personnel authorized to perform the rating analysis. The rating of a configuration is limited by the weakest portion of the topology or subsystem in that configuration. For example, a data center with a server that meets Tier IV requirements but is connected to a network that can only deliverTier II design will be rated as a Tier II facility.
When you subscribe to any cloud service, it is likely that the service provider will mention their Tier design level. DO NOT accept any such statement at face value unless the provider can produce documentation that they have satisfied the Uptime Institute's certification criteria with their design.
The Tier I through IV terms have been used too often as a general set of statements by some providers who have effectively certified themselves. The end result is that you may find that the sustainability of the cloud service you use is less reliable than advertised.