strategies require that the size of the filtered cluster be specified and, in these results,
the cluster size is taken to be the actual number of inserted attack profiles. This point
should be taken into account in comparing the results with those obtained with the
neighbourhood filtering strategy (Figure 25.9), in which no such control on the cluster size was applied. The 80% maximum recall obtained for the PLSA strategy is
due to the fact that the wrong cluster is selected approximately 20% of the time. The
PCA clustering strategy shows very good performance, even in the case of attacks
consisting of a mixture of random, average and bandwagon profiles.
The UnRAP algorithm [3] also uses clustering to distinguish attack profiles. This
algorithm uses a measure called the Hv score which has proved successful in identifying highly correlated biclusters in gene expression data. In the context of attack
detection, the Hv score measures for each user, a sum of the squared deviations of
its ratings from the user mean, item mean and overall mean ratings: