If you think about what we are trying to do at a high level, there are two questions to consider: (1) How do we map objects onto nodes, and (2) How do we route a request to the node that is responsible for a given object? We start with the first question, which has a simple statement: How do we map an object with name x into the address of some node n that is able to serve that object? While traditional peer-to-peer networks have no control over which node hosts object x, if we could control how objects get distributed over the network, we might be able to do a better job of finding those objects at a later time.
A well-known technique for mapping names into an address is to use a hash table, so that
hash(x) −−→n
implies object x is first placed on node n, and at a later time a client trying to locate x would only have to perform the hash of x to determine that it is on node n. A hash-based approach has the nice property that it tends to spread the objects evenly across the set of nodes, but straightforward hashing algorithms suffer from a fatal flaw: How many possible values of n should we allow? (In hashing terminology, how many buckets should there be?) Naively, we could decide that there are, say, 101 possible hash values, and we use a modulo hash function; that is,
hash(x)
return x % 101
Unfortunately, if there are more than 101 nodes willing to host objects, then we can’t take advantage of all of them. On the other hand, if we select a number larger than the largest possible number of nodes, then there will be some values of x that will hash into an address for a node that does not exist. There is also the not-so-small issue of translating the value returned by the hash function into an actual IP address.
To address these issues, structured peer-to-peer networks use an algorithm known as consistent hashing, which hashes a set of objects x uniformly across a large ID space. Figure 9.25 visualizes a 128-bit ID space as a circle, where we use the algorithm to place both objects
hash(object name) −−→objid
and nodes
hash(IP addr) −−→nodeid