Due to a problem with my wrists, hopefully temporary, I am
writing this short article using voice recognition software. As this
is the first time I’ve used this software, I'm actually relatively
pleased with how well it works. I am able to write e-mails, send
twitter and Facebook updates, and do many routine tasks using
my voice. Modulo a few typos, I can even write this document
(although putting this into ACM format required help). However,
a significant portion of my life nowadays involves interacting
with data, and when it comes to data interaction, “web for all”
may as well just be a slogan. Even without accessibility problems,
whether your goal is to enter, discover, or integrate data, or to try
to understand what some particular data is telling you, it isn’t
easy. Add in disability, and the problem is made much worse.
In this keynote, I will discuss some of the issues that arise as
people try to use the “broad data” that can be found on the World
Wide Web. The modern combination of “lightweight” semantics,
based to a large degree on the rapidly maturing products of early
semantic Web research, coupled with the “big data” tools that
have moved away from traditional relational databases, provides
an area of exploration that is pushing research in new and
interesting directions. Tim Berners–Lee’s call for “Raw Data
Now” is being heeded in many quarters, and other forces,
including those of transparency and innovation, are creating vast
repositories of data that are available without restriction.
As an example, governments around the world have been posting
data sets on the web at a really amazing rate. In the past year and
a half, my research group has identified and indexed the metadata
for well over 700,000 open government datasets from around the
world1
. This includes, at the time of this writing, datasets from
more than thirty countries and international organizations in 16
different languages (we currently anticipate having more than 1
million data sets by the end of this calendar year). Our research
has explored how to create, index and search metadata from this
immense Federated catalog space. We have also been developing
tools for helping users to create linked data from these data sets
and to use that link data in the development of visualizations and
other presentations that make the data more accessible. We are
also working with the US government on bringing these
techniques to the US Data.gov project.
In this talk, which I admit includes parts that are far more
speculative than practical at this point in time, I will explore how
the link spaces among the data provide the underpinnings of
potential new applications that will help bring data analytics into
our personal lives. By making data more personalized, we may be
able to achieve new possibilities in data integration that could
provide capabilities by which all of us would be able to more
fully interact with the important data that affects us in our
everyday lives (such as health and well-being), and not just in our
professional careers. I contend that similar techniques could be
used to help increase the accessibility of data on the web.
Linked-data approaches have been helping to some degree in this
arena, but still leave a lot to be desired. In short, I will explore
some exciting things happening on the web of data, but bemoan
the challenges that still remain in providing scalable access to the
Web of Data.