Measuring Efficiency of a WSC
A widely used, simple metric to evaluate the efficiency of a datacenter or a WSC
is called power utilization effectiveness (or PUE):
PUE = (Total facility power)/(IT equipment power)
Thus, PUE must be greater than or equal to 1, and the bigger the PUE the less
efficient the WSC.
Greenberg et al. [2006] reported on the PUE of 19 datacenters and the portion
of the overhead that went into the cooling infrastructure. Figure 6.11 shows what
they found, sorted by PUE from most to least efficient. The median PUE is 1.69,
with the cooling infrastructure using more than half as much power as the servers
themselves—on average, 0.55 of the 1.69 is for cooling. Note that these are average
PUEs, which can vary daily depending on workload and even external air
temperature, as we shall see.
Since performance per dollar is the ultimate metric, we still need to measure
performance. As Figure 6.7 above shows, bandwidth drops and latency increases
depending on the distance to the data. In a WSC, the DRAM bandwidth within a
server is 200 times larger than within a rack, which in turn is 10 times larger than
within an array. Thus, there is another kind of locality to consider in the placement
of data and programs within a WSC.
While designers of a WSC often focus on bandwidth, programmers developing
applications on a WSC are also concerned with latency, since latency is visible
to users. Users’ satisfaction and productivity are tied to response time of a
service. Several studies from the timesharing days report that user productivity is
inversely proportional to time for an interaction, which was typically broken
down into human entry time, system response time, and time for the person to
think about the response before entering the next entry. The results of experiments
showed that cutting system response time 30% shaved the time of an interaction
by 70%. This implausible result is explained by human nature: People
need less time to think when given a faster response, as they are less likely to get
distracted and remain “on a roll.”
Figure 6.12 shows the results of such an experiment for the Bing search engine,
where delays of 50 ms to 2000 ms were inserted at the search server. As expected
from previous studies, time to next click roughly doubled the delay; that is, a 200
ms delay at the server led to a 500 ms increase in time to next click. Revenue
dropped linearly with increasing delay, as did user satisfaction. A separate study on
the Google search engine found that these effects lingered long after the 4-week
experiment ended. Five weeks later, there were 0.1% fewer searchers per day for
users who experienced 200 ms delays, and there were 0.2% fewer searches from
users who experienced 400 ms delays. Given the amount of money made in search,
even such small changes are disconcerting. In fact, the results were so negative that
they ended the experiment prematurely [Schurman and Brutlag 2009].
Because of this extreme concern with satisfaction of all users of an Internet
service, performance goals are typically specified that a high percentage of
requests be below a latency threshold rather just offer a target for the average
latency. Such threshold goals are called service level objectives (SLOs) or
service level agreements (SLAs). An SLO might be that 99% of requests must be
below 100 milliseconds. Thus, the designers of Amazon’s Dynamo key-value
storage system decided that, for services to offer good latency on top of
Dynamo, their storage system had to deliver on its latency goal 99.9% of the
time [DeCandia et al. 2007]. For example, one improvement of Dynamo helped
the 99.9th percentile much more than the average case, which reflects their
priorities.