Tag Archives: Strategic Inflection Point

The Ultimate Datacenter – for the moment.

17 Jan

One thing I have learned in all the years I have been doing IT is that there is always “the Next Big Thing”, “The Killer App” or the “Ultimate Datacenter”.   As I look back over the years, I see many strategic inflection points, (Oh my, that was back so last century, 1998 to be exact) that are part of the IT roadmap. In typical IT lifetimes

These are changes that, according to  Andy Grove is “what happens to a business when a major change takes place in its competitive environment.  A major change due to introduction of new technologies. A major change due to the introduction of a different regulatory environment”

Now as Moore’s law describes a long-term trend in the history of computing hardware: the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years.

So if I map Grove’s comments onto Mores law, we have 6 IT lifetimes! Many reinventions, many changes so let me look at the datacenter for a moment.

Datacenters house the computer based system resources that business consumes.  Due to their very nature, these are strategic as they often house some of the most sensitive business IP and customer information.  As such, they are consumers of a lion’s share of the IT budget.  They are energy hogs, requiring power to run, generating heat and requiring cooling.

<ETLA Alert>

The science of managing these has led to a discipline, called Data Center Infrastructure Management (DCIM) which we at N’compass are really good at.  In particular, we have developed a metric, Cost to Compute (C2C) which allows business to see the cost of the DC as a functioning component of the business.  It is a methodology that correlates Power Usage Effectiveness (PUE) and Compute usage factors to produce a baseline from which to track value for IT operational effectiveness.

</ETLA Alert>

So back to the point of this post, the Ultimate Datacenter:  What is it and how does it currently look? Datacenters are typically built in “Hard Blocks” these are built to standards and fitted according to requirement.  However, they are often over engineered as they need to take into account the inefficiencies of current power conversion technologies.  For example:  Utility company power -> Converted to DC for  UPS batteries  and “clean” power which is then fed to a computer as AC only to be converted again to DC.  Yet another inefficiency is cooling.  Air cooled rooms with CRAC have to be built big to obtain efficiencies.  Cool air is forced up into server racks through the raised flooring.  This is drawn through the running components and out into hot aisles.  Google has a number of best practices around their datacenters.   As can be seen these are often be enclosed or curtained off to increase efficiency but still are reliant on the cooling medium, in this case air to extract the heat.  A more efficient solution is liquid cooling where a liquid is used to perform the same function. Hardcore Computing has developed a system that immerses the hot components in a coolant which is pumped out of the computer room for processing.  Used as an ambient heat source this must have appeal.

So if we could limit the conversions and the attendant inefficiencies as well as contain the heat in fluid, we can optimize the compute side of the DC.  The next step is managing the compute load efficiently.  This is currently done by means of virtualization.  In effect, compute resources are abstracted from the hardware and fooled into thinking that they are running alone on a dedicated platform when in effect, there are multiple instances running on the same physical compute.  That means that once you have instantiated the hardware as far as power and cooling are concerned, deploying a virtual server has little impact on the environment by extracting more efficient use of the hardware. This has resulted in huge virtual server farms with the difficulty of managing server sprawl and movement.  Vendors have become adept at moving virtual servers around, optimizing the hardware which whilst providing redundancy, produces a number of issues when trying to manage the environment.  If the hardware on which a server is running fails, then the resource needs to be migrated.  This can lead to capacity bottlenecks which monitoring systems are often not capable of tracking these migrations and produce false positives if the metrics being used to track the service are not flexible enough to allow for this churn.

Monitoring systems are as only good as the metrics.  Most modern systems expose metrics for performance and health which are often tagged on as afterthoughts.  These address operating requirements and are often only exposed via proprietary interfaces.  This makes it difficult to obtain a meaningful, integrated view of the business application for consumption by IT and ultimately the business.  It is an ongoing operational challenge to ensure that these metrics are accurate and relevant to business.  One way to look at these metrics is to reduce them to a business facing metrics such as the length of time it takes to perform a user transaction, ideally a basket of representative transactions.  If these are tracked and fed into the monitoring system over time a baseline is obtained which is used as the basis for measuring system performance.  Most modern monitoring systems are aware that there are multiple methods of obtaining this and will consume these measurements in a variety of ways for example SNMP traps or WMI or ssh calls. Some are reactive, such as SNMP traps which rely on the occurrence of an event before notification can take place.  Active polling on the other hand can often miss an event if the counter polled is not in breach at time of polling.  Agents that are deployed on the endpoint provide much better response but consume resources which are often scarce at time of need.  I have seen agents bring servers to a grinding halt by consuming resources to the point that the server cannot function.  The ideal situation is therefore a blended solution that utilizes the most lightweight method possible in order to obtain the counters that make up the baseline.  This is made even more challenging when the applications can “float” up, often seemingly without control to the cloud.

So, as seems to be indicated above, in a perfect Ultimate Datacenter, there is a predictable supply of well managed and cooled servers running on redundant or near redundant energy efficient platforms.  In this world, proactive monitoring alerts the support technician to trends indicating a potential malfunction that could lead to degraded or system outage.  Business is tracking the metrics that matter to it and are working with correlated IT metric feeds that address their performance concerns.  IT is alert to the business performance metrics that form part of its negotiated and agreed service level tasks. Nirvana? In the upcoming articles, I will attempt to address and define certain vital pieces that will transform the datacenter and drive it up the maturity curve.

Advertisements