It is estimated that over 90% of all new information produced
in the world is being stored on magnetic media, most of it on
hard disk drives. Despite their importance, there is relatively
little published work on the failure patterns of disk drives, and
the key factors that affect their lifetime. Most available data
are either based on extrapolation from accelerated aging experiments
or from relatively modest sized field studies. Moreover,
larger population studies rarely have the infrastructure in place
to collect health signals from components in operation, which
is critical information for detailed failure analysis.
We present data collected from detailed observations of a
large disk drive population in a production Internet services deployment.
The population observed is many times larger than
that of previous studies. In addition to presenting failure statistics,
we analyze the correlation between failures and several
parameters generally believed to impact longevity.
Our analysis identifies several parameters from the drive’s
self monitoring facility (SMART) that correlate highly with
failures. Despite this high correlation, we conclude that models
based on SMART parameters alone are unlikely to be useful
for predicting individual drive failures. Surprisingly, we found
that temperature and activity levels were much less correlated
with drive failures than previously reported
It is estimated that over 90% of all new information producedin the world is being stored on magnetic media, most of it onhard disk drives. Despite their importance, there is relativelylittle published work on the failure patterns of disk drives, andthe key factors that affect their lifetime. Most available dataare either based on extrapolation from accelerated aging experimentsor from relatively modest sized field studies. Moreover,larger population studies rarely have the infrastructure in placeto collect health signals from components in operation, whichis critical information for detailed failure analysis.We present data collected from detailed observations of alarge disk drive population in a production Internet services deployment.The population observed is many times larger thanthat of previous studies. In addition to presenting failure statistics,we analyze the correlation between failures and severalparameters generally believed to impact longevity.Our analysis identifies several parameters from the drive’sself monitoring facility (SMART) that correlate highly withfailures. Despite this high correlation, we conclude that modelsbased on SMART parameters alone are unlikely to be usefulfor predicting individual drive failures. Surprisingly, we foundthat temperature and activity levels were much less correlatedwith drive failures than previously reported
การแปล กรุณารอสักครู่..