1. Scalable and Memory-Efficient Clustering of Massive Graphs
A popular class of graph clustering algorithms for large-scale networks, such as PMetis, KMetis and Graclus, is based on a multilevel framework. Generally, these multilevel algorithms work reasonably well on networks with a few million vertices. However, when the network size increases to the scale of 10 million vertices or greater, the performance of these algorithms rapidly degrades. An inherent property of social networks, the power law degree distribution, makes these algorithms infeasible to apply to large-scale social networks. We propose a scalable and memory-efficient clustering algorithm, GEM, which efficiently extracts a good skeleton graph from the original graph, and propagates the clustering result of the extracted graph to the rest of the network. Experimental results show that GEM produces clusters of quality comparable to or better than existing state-of-the-art graph clustering algorithms, while it is much faster and consumes much less memory. Furthermore, the parallel implementation of GEM, called PGEM, not only produces higher quality of clusters but also achieves much better scalability than most current parallel graph clustering algorithms.