If you think about what we are trying to do at a high level, there are
two questions to consider: (1) How do we map objects onto nodes, and
(2) How do we route a request to the node that is responsible for a given
object? We start with the first question, which has a simple statement:
How do we map an object with name x into the address of some node n
that is able to serve that object? While traditional peer-to-peer networks
have no control over which node hosts object x, if we could control how
objects get distributed over the network, we might be able to do a better
job of finding those objects at a later time.
A well-known technique for mapping names into an address is to use a
hash table, so that
hash(x) −−→n
implies object x is first placed on node n, and at a later time a client trying
to locate x would only have to perform the hash of x to determine that it
is on node n. A hash-based approach has the nice property that it tends
to spread the objects evenly across the set of nodes, but straightforward
hashing algorithms suffer from a fatal flaw: How many possible values of
n should we allow? (In hashing terminology, how many buckets should
there be?) Naively, we could decide that there are, say, 101 possible hash
values, and we use a modulo hash function; that is,
hash(x)
return x % 101
Unfortunately, if there are more than 101 nodes willing to host objects,
thenwe can’t take advantage of all of them.On the other hand, ifwe select
a number larger than the largest possible number of nodes, then there will
be some values of x that will hash into an address for a node that does not
exist. There is also the not-so-small issue of translating the value returned
by the hash function into an actual IP address.
To address these issues, structured peer-to-peer networks use an algorithm
known as consistent hashing, which hashes a set of objects x uniformly
across a large ID space. Figure 9.25 visualizes a 128-bit ID space as
a circle, where we use the algorithm to place both objects
hash(object name) −−→objid
and nodes
hash(IP addr) −−→nodeid