The possibility of OS fingerprinting—the fact that idiosyncrasies
in the implementation of network protocols such as
TCP and IP are remotely detectable and hard to disguise—
has long been known, and there exist many software tools
that take advantage of it. However, both research and practice
have mainly focused on IPv4. In this paper we apply
machine learning techniques to the problem of OS fingerprinting
over IPv6, with a thorough examination of the fingerprinting
engine we built and which is deployed in the
Nmap security scanner. Figure 1 is sample output, which in
this case shows that the remote host is running Linux, and
one of a small range of kernel versions.
The engine’s design is guided by many years’ experience
in dealing with IPv4-based detection. It is fundamentally
based on a logistic regression model trained on a few hundred
known OS “fingerprints,” which are the packets received
in response to up to 18 specially crafted network probes. OS
fingerprinting has traditionally relied on a nearest-neighbor
match against a database of known fingerprints. The use of
machine learning is aimed at enabling the engine to better
identify fingerprints it has not seen exactly before; and also
to cope with the many ways a packet may be corrupted in
transit (unfortunately many of the protocol fields useful for
fingerprinting are often modified by intermediate devices).
The great diversity of operating systems and types of network
interference have in the past required a proportionally
large database of known fingerprints. The cost of maintaining
such a database is non-trivial and we hope to make it
more manageable.
The training set of known operating system fingerprints
was initially seeded by us through manual scans of common