Having walked through my share of data centers, it is always interesting to see such a heterogeneous amalgamation of IT gear that has accumulated since the data center itself was commissioned. While every data center designer and manager starts out with wild fanciful ideals about the pristine architecture of the data center, the actual complexion of the data center changes dramatically over time and we are left with rows and rows of assorted gear, all happily consuming power, blinking LEDs, and perhaps 20%-30% of these devices no longer in use… Zombies abound!
Perhaps Zombies is a harsh word, but the concept is the same. A non-trivial portion of the devices in the data center are powered, generating heat, consuming precious IP addresses, and yet performing NO actual work. Why? Their intended application changed over time, the project was never completed, their original workload was shifted elsewhere, a test bed that was never dismantled, and a dozen other reasons exist for large quantities of machines entering the Zombie realm, but there we have it, machine after machine that is in the living dead state, and WORSE THAN THAT, we do not have enough information about these devices to TURN THEM OFF. So they sit, consuming resources in the safety of the data center, avoiding decommissioning… And here’s the myth/rub: A server just idling along just running the operating system consumes 60%-70% of its total power before any workload is applied! A server doing NO work is wasting almost two-thirds of its maximum rated power! Note to self: this is a real issue and not something we can choose to overlook any longer. With the price of power at record highs, and power increasing by 7% per year as far as we can see in the future, WE HAVE to find these Zombies and kill them.
How can we reclaim the resources being consumed by these Zombies? We have to build designs that intelligently monitor power consumption and pro-actively and continually test to see if those resources are efficiently doing work. We need to observe power consumption either directly using embedded sensors (such as the Energy-Star compliance servers) or with intelligent power distribution devices (ideally with per-outlet metrics). Here is the secret: Zombies all have a similar trait… they stay fairly constant in their power consumption. A server will likely consume almost two-thirds of its maximum power before any loads and work is applied. A Zombie server therefore will continue to consume the same two-thirds of its rated values every time you look at it.
Creating new IT best practices which identify the need for per-device power monitoring is the first step. And the second step is deploying an intelligent monitoring tool which has the ability to look over longer periods of time at the energy being consumed on a device level basis. Some simple standard deviation math will result in servers that can no longer hide their ‘walking dead’ status. Pro-Active monitoring will identify Zombies and allow you to reclaim power, space and cooling quite easily!