Predictions are deemed significant if they can be shown to be successful beyond random chance. Therefore, methods of statistical hypothesis testing are used to determine the probability that an earthquake such as is predicted would happen anyway (the null hypothesis). The predictions are then evaluated by testing whether they correlate with actual earthquakes better than the null hypothesis.
In many instances, however, the statistical nature of earthquake occurrence is not simply homogeneous, with clustering in both space and time. In southern California about 6% of M≥3.0 earthquakes are "followed by an earthquake of larger magnitude within 5 days and 10 km." In central Italy 9.5% of M≥3.0 earthquakes are followed by a larger event within 30 km and 48 hours. While such statistics are not satisfactory for purposes of prediction (giving ten to twenty false alarms for each successful prediction) they will skew the results of any analysis that assumes that earthquakes occur randomly in time, for example, as realized from a Poisson process. It has been shown that a "naive" method based solely on clustering can successfully predict about 5% of earthquakes; slightly better than chance.
As the purpose of short-term prediction is to enable emergency measures to reduce death and destruction, failure to give warning of a major earthquake, that does occur, or at least an adequate evaluation of the hazard, can result in legal liability, or even political purging. But warning of an earthquake that does not occur also incurs a cost:
not only the cost of the emergency measures themselves, but of civil and economic disruption. False alarms, including alarms that are cancelled, also undermine the credibility, and thereby the effectiveness, of future warnings.