Clustering is performed using the Macnaughton-Smith [13] divisive clustering
algorithm. The rating distributions for the active item over each of the clusters are
then compared. Since the goal of an attacker is to force the predicted ratings of
targeted items to a particular value, it is reasonable to expect that the ratings for
targeted items that are contained in any attack profiles are centered on the attack
value, which is likely to deviate significantly from the mean of the authentic neighbours’ ratings. Thus an attack is deemed to have taken place if the difference in the
means for the two clusters is sufficiently large. The cluster with the smaller standard
deviation is determined to be the attack cluster.
Results for this algorithm (using precision and NPV ) applied to an informed
nuke attack on the Movielens dataset are reproduced in Figure 25.9. The fraction of
authentic users contained in the cluster identified as the cluster of authentic users
is at least 75% for all attack sizes tested, so attack profiles are being effectively
filtered from the system. However, particularly for small attack sizes, a significant
proportion of the attack cluster is made up of authentic users. The cost of removing
malicious profiles is to also lose authentic profiles that may have contributed to the
accuracy of the prediction. Results show that filtering a system that has not been
attacked leads to an increase of around 10% in the MAE.