The dictionary data structure is ubiquitous in computer science. A dictionary
is used for maintaining a set S under insertion and deletion of
elements (referred to as keys) from a universe U. Membership queries
(“x ∈ S?”) provide access to the data. In case of a positive answer the
dictionary also provides a piece of satellite data that was associated with
x when it was inserted.
A large theory, partially surveyed in Section 2, is devoted to dictionaries.
It is common to study the case where keys are bit strings in U = {0, 1}
w
and w is the word length of the computer (for theoretical purposes modeled
as a RAM). Section 3 discusses this restriction. It is usually, though not
always, clear how to return associated information once membership has
been determined. E.g., in all methods discussed in this paper, the associated
information of x ∈ S can be stored together with x in a hash table.
Therefore we disregard the time and space used to handle associated information
and concentrate on the problem of maintaining S. In the following
we let n denote |S|.
The most efficient dictionaries, in theory and in practice, are based on
hashing techniques. The main performance parameters are of course lookup
time, update time, and space. In theory there is no trade-off between these:
One can simultaneously achieve constant lookup time, expected amortized
constant update time, and space within a constant factor of the information
theoretical minimum of B = log
|U|
n
bits [6]. In practice, however, the
various constant factors are crucial for many applications. In particular,
lookup time is a critical parameter. It is well known that one can achieve
performance arbitrarily close to optimal if a sufficiently sparse hash table is
used. Therefore the challenge is to combine speed with a reasonable space
usage. In particular, we only consider schemes using O(n) words of space.
The contribution of this paper is a new, simple hashing scheme called
Cuckoo Hashing. A description and analysis of the scheme is given in
Section 4, showing that it possesses the same theoretical properties as the
dynamic dictionary of Dietzfelbinger et al. [10]. That is, it has worst case
constant lookup time and amortized expected constant time for updates.
A special feature of the lookup procedure is that (disregarding accesses
to a small hash function description) there are just two memory accesses,
which are independent and can be done in parallel if this is supported by
the hardware. Our scheme works for space similar to that of binary search
trees, i.e., three words per key in S on average.
Using weaker hash functions than those required for our analysis, Cuckoo
Hashing is very simple to implement. Section 5 describes such an implementation,
and reports on extensive experiments and comparisons with the
most commonly used methods, having no nontrivial worst case guarantee
CUCKOO HASHING 3
on lookup time. It seems that an experiment comparing the most commonly
used methods on a modern multi-level memory architecture has not
previously been described in the literature. Our experiments show Cuckoo
Hashing to be quite competitive, especially when the dictionary is small
enough to fit in cache. We thus believe it to be attractive in practice, when
a worst case guarantee on lookups is desired.