To apply k-means-based clustering to spam detection on Twitter, some modifications were made to the StreamKM++
algorithm [5]. First, an additional parameter e was added as the threshold for a point to be considered part of a cluster found
using k-means++. Second, a detection scheme was implemented in order to classify points the moment they arrive from the
stream. As each new point comes in, our modified StreamKM++ finds the nearest coreset cluster based on the most recent set
of centers and calculates the Euclidean distance. If the distance is above the value of e, then the point is detected as spam and
is not inserted into the nearest coreset. Using this technique, spam instances are prevented from entering into the coresets.
The modified StreamKM++ is further defined in Algorithm