Archive | DCIM RSS feed for this section

Ultimate Infrastructure

3 Feb

As I stated previously, there are many components that have to efficiently work together in harmony to deliver the services that business has come to rely on so heavily.  These are complex and need to perform exceedingly well or users in search of instant gratification will find other business services are just a click away. To deliver these services large datacenters are built that cost millions of dollars.  These are designed to house the sheet metal and silicone that deliver the services.

I wonder why a number of data centers  are built far from human habitation, in very warm climates such as Arizona.  I understand that cheap power is great, but surely the cooling of these environments consumes more than the powering of the computers contained in them? The ideal environment for computers seems to be similar to that of humans. A temperature range 20-21°C (68-71°F) and relative humidity levels between 45% and 60% are deemed best for safe server operation. These are managed by Building Management Systems and form part of a science called DCIM or data center infrastructure management. Large Heating, Ventilation and Air Conditioning (HVAC) units must be employed to manage the environments.  That not proving enough, Computer Room Air Conditioners (CRAC) units provide extra and directed point cooling. As such data centers are built for efficiency and security and aesthetics are expensive additions.

Data centers get big, I mean Huge Data Centers:  and all of these require redundant utility and connectivity.  Once those are laid on, security would be great as would some place for any people lucky enough to have to staff the DC to subsist.  However, siting all your precious data eggs in a single basket is likely to prove foolhardy so, that means another DC of similar proportion with the attendant costs.  Expensive!  So how about a DC that looks like a home?

Infrastructure as a service is a potential solution to owning a DC  but there is some data, intellectual property that is so sensitive, so core to a company that it will always have to remain internal, whether as a competitive advantage or as a result of regulation such as HIPAA or PCI. Another, is the control of one’s own destiny, the sense of ownership and comfort that owning and being able to see touch feel the substance.

Because of these reasons, DCIM will continue to be of importance to companies. Large hosting companies will continue to improve cost to profit ratios of their operations by driving efficiency in density power usage and cooling.  The benefits of these efforts is likely to affect the entire industry.

So what does the ultimate DC infrastructure look like?  I think we could see smaller, compact DCs containing high density, efficient servers  housing the core critical services and the “needs to be protected at all costs” data. These will be surrounded by the user access layer, a cloud of Applications As Services hosted either internally or in a public cloud running virtual desktops that can be securely accessed from pretty much anywhere including user owned tablets.  Control of the information is still vested in the enterprise as the users access it via an emulation interface.  Reminds me of the mainframe environment of not so long ago, or maybe it was aeons ago.

What are your thoughts?

 

 

The Ultimate Datacenter – for the moment.

17 Jan

One thing I have learned in all the years I have been doing IT is that there is always “the Next Big Thing”, “The Killer App” or the “Ultimate Datacenter”.   As I look back over the years, I see many strategic inflection points, (Oh my, that was back so last century, 1998 to be exact) that are part of the IT roadmap. In typical IT lifetimes

These are changes that, according to  Andy Grove is “what happens to a business when a major change takes place in its competitive environment.  A major change due to introduction of new technologies. A major change due to the introduction of a different regulatory environment”

Now as Moore’s law describes a long-term trend in the history of computing hardware: the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years.

So if I map Grove’s comments onto Mores law, we have 6 IT lifetimes! Many reinventions, many changes so let me look at the datacenter for a moment.

Datacenters house the computer based system resources that business consumes.  Due to their very nature, these are strategic as they often house some of the most sensitive business IP and customer information.  As such, they are consumers of a lion’s share of the IT budget.  They are energy hogs, requiring power to run, generating heat and requiring cooling.

<ETLA Alert>

The science of managing these has led to a discipline, called Data Center Infrastructure Management (DCIM) which we at N’compass are really good at.  In particular, we have developed a metric, Cost to Compute (C2C) which allows business to see the cost of the DC as a functioning component of the business.  It is a methodology that correlates Power Usage Effectiveness (PUE) and Compute usage factors to produce a baseline from which to track value for IT operational effectiveness.

</ETLA Alert>

So back to the point of this post, the Ultimate Datacenter:  What is it and how does it currently look? Datacenters are typically built in “Hard Blocks” these are built to standards and fitted according to requirement.  However, they are often over engineered as they need to take into account the inefficiencies of current power conversion technologies.  For example:  Utility company power -> Converted to DC for  UPS batteries  and “clean” power which is then fed to a computer as AC only to be converted again to DC.  Yet another inefficiency is cooling.  Air cooled rooms with CRAC have to be built big to obtain efficiencies.  Cool air is forced up into server racks through the raised flooring.  This is drawn through the running components and out into hot aisles.  Google has a number of best practices around their datacenters.   As can be seen these are often be enclosed or curtained off to increase efficiency but still are reliant on the cooling medium, in this case air to extract the heat.  A more efficient solution is liquid cooling where a liquid is used to perform the same function. Hardcore Computing has developed a system that immerses the hot components in a coolant which is pumped out of the computer room for processing.  Used as an ambient heat source this must have appeal.

So if we could limit the conversions and the attendant inefficiencies as well as contain the heat in fluid, we can optimize the compute side of the DC.  The next step is managing the compute load efficiently.  This is currently done by means of virtualization.  In effect, compute resources are abstracted from the hardware and fooled into thinking that they are running alone on a dedicated platform when in effect, there are multiple instances running on the same physical compute.  That means that once you have instantiated the hardware as far as power and cooling are concerned, deploying a virtual server has little impact on the environment by extracting more efficient use of the hardware. This has resulted in huge virtual server farms with the difficulty of managing server sprawl and movement.  Vendors have become adept at moving virtual servers around, optimizing the hardware which whilst providing redundancy, produces a number of issues when trying to manage the environment.  If the hardware on which a server is running fails, then the resource needs to be migrated.  This can lead to capacity bottlenecks which monitoring systems are often not capable of tracking these migrations and produce false positives if the metrics being used to track the service are not flexible enough to allow for this churn.

Monitoring systems are as only good as the metrics.  Most modern systems expose metrics for performance and health which are often tagged on as afterthoughts.  These address operating requirements and are often only exposed via proprietary interfaces.  This makes it difficult to obtain a meaningful, integrated view of the business application for consumption by IT and ultimately the business.  It is an ongoing operational challenge to ensure that these metrics are accurate and relevant to business.  One way to look at these metrics is to reduce them to a business facing metrics such as the length of time it takes to perform a user transaction, ideally a basket of representative transactions.  If these are tracked and fed into the monitoring system over time a baseline is obtained which is used as the basis for measuring system performance.  Most modern monitoring systems are aware that there are multiple methods of obtaining this and will consume these measurements in a variety of ways for example SNMP traps or WMI or ssh calls. Some are reactive, such as SNMP traps which rely on the occurrence of an event before notification can take place.  Active polling on the other hand can often miss an event if the counter polled is not in breach at time of polling.  Agents that are deployed on the endpoint provide much better response but consume resources which are often scarce at time of need.  I have seen agents bring servers to a grinding halt by consuming resources to the point that the server cannot function.  The ideal situation is therefore a blended solution that utilizes the most lightweight method possible in order to obtain the counters that make up the baseline.  This is made even more challenging when the applications can “float” up, often seemingly without control to the cloud.

So, as seems to be indicated above, in a perfect Ultimate Datacenter, there is a predictable supply of well managed and cooled servers running on redundant or near redundant energy efficient platforms.  In this world, proactive monitoring alerts the support technician to trends indicating a potential malfunction that could lead to degraded or system outage.  Business is tracking the metrics that matter to it and are working with correlated IT metric feeds that address their performance concerns.  IT is alert to the business performance metrics that form part of its negotiated and agreed service level tasks. Nirvana? In the upcoming articles, I will attempt to address and define certain vital pieces that will transform the datacenter and drive it up the maturity curve.