Identifying malicious web sites has become a major chal-
lenge in today's Internet. Previous work focused on detecting
if a web site is malicious by dynamically executing JavaScript
in instrumented environments or by rendering web sites in
client honeypots. Both techniques bear a signicant evaluation
overhead, since the analysis can take up to tens of seconds or
even minutes per sample.
In this paper, we introduce a novel, purely static analy-
sis approach, the -system, that (i) extracts change-related
features between two versions of the same website, (ii) uses
a machine-learning algorithm to derive a model of web site
changes, (iii) detects if a change was malicious or benign, (iv)
identies the underlying infection vector campaign based on
clustering, and (iv) generates an identifying signature.
We demonstrate the eectiveness of the -system by eval-
uating it on a dataset of over 26 million pairs of web sites by
running next to a web crawler for a period of four months. Over
this time span, the -system successfully identied previously
unknown infection campaigns. Including a campaign that
targeted installations of the Discuz!X Internet forum software
by injecting infection vectors into these forums and redirecting
forum readers to an installation of the Cool Exploit Kit.