Conclusion
In this paper, we present an efficient, privacy-preserving K-means clustering algorithm in a social network setting. We present a mechanism where the private data of the users, sensitive intermediate values and the final clustering assignments are protected by means of encryption. The service provider, who does not have the decryption key, can still perform clustering without being able to access the content of private data. While the approach of processing encrypted data presents a concrete privacy protection for the users, it also introduces performance drawbacks compared to the version with plain text due to data expansion after encryption and expensive operations on the encrypted data. Previous work has shown different approaches to reduce the complexity of privacy-preserving K-means clustering such as using semi-trusted third parties. In this work, we build a mechanism on the common server-client model and reduce the costs by employing data packing. By this way, we reduce the number of encryption by a factor of K, thus introducing a considerable gain in terms of communication and computation.We also avoid interactive protocols such as secure comparison by exploiting the distributive setting. We also distribute trust among multiple random users for each iteration of the protocol, which introduces a computational gain proportional to the number of such users. The resulting cryptographic protocol is significantly more efficient compared to previous work in the semihonest security model. We also analyze the effects of different choices of parameters on the performance of the cryptographic protocol. Experimental results support our claim on the feasibility of privacy-preserving K-means clustering such that it takes 26 min to cluster 100,000 users. This result, which can be improved further on a real system, encourages the deployment of privacy-preserving K-means clustering algorithms based on homomorphic encryption.