Quality Control for Real-time Ubiquitous Crowdsourcing
Afra J.Mashhadi
Dept. of Computer Science
University College London
London WC1E 6BT, UK
a.jahanbakhshmashhadi@cs.ucl.ac.uk
ABSTRACT
Crowdsourcing has become a successful paradigm in
the past decade, as Web 2.0 users have taken a more
active role in producing content as well as consuming
it. Recently this paradigm has broadened to incorpo-
rate ubiquitous applications, in which the smart-phone
users contribute information about their surrounding,
thus providing a collective knowledge about the physi-
cal world. However the acceptance and openness of such
applications has made it easy to contribute poor qual-
ity content. Various solutions have been proposed for
the Web-based domain, to assist with monitoring and
_ltering poor quality content, but these methods fall
short when applied to ubiquitous crowdsourcing, where
the task of collecting information has to be performed
continuously and in real-time, by an always changing
crowd. In this paper we discuss the challenges for qual-
ity control in ubiquitous crowdsorucing and propose a
novel technique that reasons on users mobility patterns
and quality of their past contributions to estimate user's
credibility.
Author Keywords: Crowdsourcing, Quality Assur-
ance, Participation.
ACM Classification Keywords:
Social and Behavioural Sciences.
General Terms: Human Factors, Verification.
INTRODUCTION
Thanks to the widespread adoption of powerful and net-
worked (i.e., Internet-enabled) handheld devices, con-
sumers of digital content are now taking a more active
role in producing content on the go. This trend has al-
lowed a new category of applications to surface, in which
data is collected by participants and is collectively used
to offer services to citizens [3]. In this paper we focus
on a new stream of research known as ubiquitous crowd-
sourcing, in which the contributed information is not
limited to passively-generated sensor-readings from the
device, but also includes proactively-generated user's
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
UbiCrowd’11, September 18, 2011, Beijing, China.
Copyright 2011 ACM 978-1-4503-0927-1/11/09...$10.00.
opinions and perspectives, that are processed to offer
real-time services to participants.
For example, in an urban city such as London, where
there exists a complex transport network with unavoid-
able disruptions, users engagement in travel updates
can be very valuable. By introducing a ubiquitous crowd-
sourcing application for public transportation, partici-
pants could actively contribute real-time information
related to their journey. These contributions could in-
clude, for instance, information about accidents, un-
planned road closures, congestions, and other highly
dynamic events that affect user's journeys, but that
transport authorities do not have the capacity to pro-
cess as promptly as required. If such information could
be gathered in real-time, users could be given useful
updates while their journeys execute, so to dynami-
cally and effectively adapt their travel plans [6]. In
such scenarios, the real-time information contributed
by participants can be invaluable, and thus the more
participants are engaged in providing this information,
the better such applications can work [4]. However,
the very same openness characteristic of such applica-
tions can threaten their success and impact the correct-
ness of the results, as they allow anyone to contribute
information. Indeed, a field trial study of ubiquitous
crowdsourcing application has shown that users are con-
cerned with the credibility of data provided by the other
participants [21]. Therefore, quality control in crowd-
sourcing applications is an important issue which can-
not be neglected. In the web domain, this challenge has
been highlighted and effectively tackled by using various
approaches such as aggregation and reporting. How-
ever, ubiquitous crowdsourcing exhibits unique proper-
ties that lead to different requirements for controlling
the quality of contributions. These properties are as
follows:
@ Real-time Events: in ubiquitous crowdsourcing the
task of collecting information is often tightly linked
to events which are highly dynamic. Furthermore,
the collected information needs to be analysed, and
the results provided to users, in real-time. For in-
stance, in the above scenario, participants can upload
information about the status of the bus journey they
are taking, and the collected information has to be
processed in real-time to give an estimate of buses
real arrival time to awaiting users. This requirement
of processing contributions in real-time differs from
what is observed in web-based crowdsourcing, where
the applications can achieve quality assurance by re-
lying on users (or authorised users) to flag and report
poor quality content with some time delay.
@ Dynamic Crowds: as opposed to web-based crowd-
sourcing, in ubiquitous crowdsourcing the crowd set
(i.e., participants) keeps changing all the time. Let
us refer back to the public transport scenario, where
the crowd that can contribute travel information is
formed by public transport users, undertaking their
daily journeys. Such crowd varies throughout the day
(e.g., the travellers who can report disruptions on a
bus route), and it may not always reach the critical
mass required for such applications to function (e.g.,
night-bus riders may be just a handful). Sparsity of
contributions by small crowds is a well-known chal-
lenge in web-based crowdsourcing systems too, with
severe impact on content quality [4]. However the
web-based techniques cannot directly be applied in
our domain, because of the the real-time and highly
dynamic nature of the applications at hand.
To address the above challenges, we propose a technique
which estimates the quality of contributions based on
the contributor's mobility, as well as their trustwor-
thiness score based on their past contributions. We
combine these two sets of information to estimate a
credibility weight for each contributor, allowing us to
compute the results based on a weighted average of
all the uploaded contributions. We continue this pa-
per with an overview of the current state-of-the-art in
quality control within the crowdsourcing paradigm; we
then proceed to our novel quality control technique and
lay out our evaluation plan.
RELATED WORK
Web-based Crowdsourcing
Web-based crowdsourcing systems have gained popu-
larity in recent years; in this domain, content quality
is assured by means of: manual edition, aggregation
and user's reputation. In [9], the authors propose a
decision matrix for selecting one of the quality assur-
ance techniques based on tasks complexity. In practice
(i.e., Slashdot, Reddit, and Digg) content quality is of-
ten achieved by relying on users and their social net-
works to report and filter inappropriate content from
the vast volume of online stories. Other systems such
as Wikipedia and IMDB restrict the quality control to
only a pre-de_ned set of editors (i.e., experts), who can
delete and correct poor quality articles (e.g., spam, mis-
information, abusive language). In these cases, the ed-
itors are often chosen based on their profile and histor-
ical data on their previous contributions, such as the
number of edited articles.
In addition to examples of centralised web-based crowd-
sourcing applications, semi-distributed crowdsourcing
systems have also been extensively studied by the re-
search community. An example of this research stream
is Voluntary Geographical Information (VGI), in which
the data is contributed by participants from the physi-
cal world and used to maintain and enhance the overall
body of environmental knowledge (e.g., OpenStreeMaps
and WikiMapia). In [14], the authors investigate the
quality of voluntarily tagged Points Of Interests (POIs),
and propose an algorithm which aggregates contributed
data to retain the POIs that are only consistently re-
peated. In [5], Flanagin et al. discuss the issue of cred-
ibility of VGI by arguing that credibility is a measure
of trustworthiness more than of expertise (i.e., data ac-
curacy). That is, credibility is less about data accuracy
and more about which information, or perspective, peo-
ple believe in. In [1], a need for a trust model that takes
into account subjectivity of geographic information and
user's perspective has been discussed.