Returning to the question of how the graph evolves, a node certainly
has to knowabout at least one other node when it joins a Gnutella overlay.
The new node is attached to the overlay by at least this one link. After that,
a given node learns about other nodes as the result of QUERY RESPONSE
messages, both for objects it requested and for responses that just happen
to pass through it. A node is free to decide which of the nodes it discovers
in this way that it wants to keep as a neighbor. The Gnutella protocol provides
PING and PONG messages by which a node probes whether or not
a given neighbor still exists and that neighbor’s response, respectively.
It should be clear that Gnutella as described here is not a particularly
clever protocol, and subsequent systems have tried to improve upon it.
One dimension along which improvements are possible is in how queries
are propagated. Flooding has the nice property that it is guaranteed to
find the desired object in the fewest possible hops, but it does not scale
well. It is possible to forward queries randomly, or according to the probability
of success based on past results. A second dimension is to proactively
replicate the objects, since the more copies of a given object there
are, the easier it should be to find a copy. Alternatively, one could develop
a completely different strategy, which is the topic we consider next.