ABSTRACT
Of the many P2P file-sharing prototypes in existence, BitTorrent
is one of the few that has managed to attract millions
of users. BitTorrent relies on other (global) components
for file search, employs a moderator system to
ensure the integrity of file data, and uses a bartering technique
for downloading in order to prevent users from
freeriding. In this paper we present a measurement study
of BitTorrent in which we focus on four issues, viz. availability,
integrity, flashcrowd handling, and download performance.
The purpose of this paper is to aid in the understanding
of a real P2P system that apparently has the
right mechanisms to attract a large user community, to
provide measurement data that may be useful in modeling
P2P systems, and to identify design issues in such
systems.
1. INTRODUCTION
Even though many P2P file-sharing systems have been
proposed and implemented, only very few have stood the
test of intensive daily use by a very large user community.
The BitTorrent file-sharing system is one of these
systems. Measurements on Internet backbones indicate
that BitTorrent has evolved into one of the most popular
networks [8]. In fact, BitTorrent traffic made up 53 % of
all P2P traffic in June 2004 [12]. As BitTorrent is only a
file-download protocol, it relies on other (global) components,
such as web sites, for finding files. The most popular
web site for this purpose at the time we performed
our measurements was suprnova.org.
There are different aspects that are important for the
acceptance of a P2P system by a large user community.
First, such a system should have a high availability. Secondly,
users should (almost) always receive a good version
of the content (no fake files) [10]. Thirdly, the system
should be able to deal with flashcrowds. Finally,
users should obtain a relatively high download speed.
In this paper we present a detailed measurement study
of the combination of BitTorrent and Suprnova. This
measurements study addresses all four aforementioned
aspects. Our measurement data consist of detailed traces
gathered over a period of 8 months (Jun’03 to Mar’04) of
more than two thousand global components. In addition,
for one of the most popular files we followed all 90,155
downloading peers from the injection of the file until its
disappearance (several months). In a period of two weeks
we measured the bandwidth of 54,845 peers downloading
over a hundred newly injected files. This makes our measurement
effort one of the largest ever conducted.
The contributions of this paper are the following: first,
we add to the understanding of the operation of a P2P filesharing
system that apparently by its user-friendliness,
the quality of the content it delivers, and its performance,
has the right mechanisms to attract millions of users.
Second, the results of this paper can aid in the (mathematical)
modeling of P2P systems. For instance, in the
fluid model in [13], it is assumed that the arrival process
and the abort and departure processes of downloaders are
Poisson, something that is in obvious contradiction with
our measurements. One of our main conclusions is that
within P2P systems a tension exists between availability,
which is improved when there are no global components,
and data integrity, which benefits from centralization.