Paper
In any case, the Web and the SEs do not substitute the classical, loved libraries.
Looking backwards, libraries can be classified into 3 types:
1. Analog/Paper Library (PL) the classical paper library with its card
catalog.
2. Automated/Hybrid Library (AL) - an analog library with a computerizedcatalog.
http://www.ifla.org/IV/ifla66/papers/029-142e.htm
3. Digital Library (DL) - a computerized library in which most of the
information is digital.
The problems of our regular libraries are well known and need not be detailed
here. On the other hand, it is less clear to us what a digital library is and what
are its various characteristics.
First, we classify the digital libraries into three categories:
1. Single Digital Library (SDL) the regular classical library implemented in
a fully computerized fashion.
2. Federated Digital Library (FDL) - this is a federation of several
independent libraries, centered on a common theme, on the network.
3. Harvested Digital Library (HDL) - this is a virtual library providing
summarized access to related material scattered over the network.
Consequently, we compare the various types of libraries and focus on a
comprehensive comparison between HDLs and SEs on the Web. To
demonstrate, we show exemplary digital libraries. In particular, we mention the
Katsir HDL, based on the Harvest system, which is currently being developed
in Bar-Ilan University.
1 Introduction
The Internet and the Web have been growing in leaps and bounds over the past
few years, accelerating the problem of information explosion, a well-known
phenomena to all of us. According to Nature 1, the publicly indexable Web
contains an estimated 800 million pages as of February 1999. Indeed, the
growing amount of Search Engines (SEs) that have popped up everywhere,
reaching more than 2400 different SEs, enable us to access the cyberspace, but
they also flood us with vast amounts of irrelevant information. Search engine
coverage, relative to the estimated size of the publicly indexable Web, has
recently decreased substantially, with no engine indexing more than about 16%
of the estimated size of the publicly indexable Web 1.
The article is structured as follows. This section presents the resource repository
hierarchy, defines the notion of a library and the development from paper to
digital libraries. The following section classifies digital libraries, compares
between the different types and introduces the logical harvesting model. We
conclude with a discussion.
1.1 Resource Repositories Hierarchy
Both SEs and Digital Libraries (DLs) are Internet Resource Discovery (IRD)
Tools. We introduce a resource repositories hierarchy with two major
paradigms: search engines and digital libraries, where each branches to
categories. SEs can be classified into three categories: Basic-SE, Directory, and
Meta-SE. All the categories support search user interfaces, but with significant
differences in their construction method:
1. Basic-SE/Index a tool that uses an automatic robot/crawler to gather
metadata on items.
2 of 8 3 2/12/01 4:15 PM
Digital libraries on the Internet ...nference Programme and Proceedings http://www.ifla.org/IV/ifla66/papers/029-142e.htm
.1
3 of 8
2. Directory/Catalog/Guide - a tool that uses human judgement to collect and
catalog items.
3. Meta-SE a tool that holds no database of its own, but rather queries
Basic-SEs upon a user request.
A detailed discussion about digital libraries, including DL categories, will be
presented in section 2.
1.2 What is a library
Before we delve into digital libraries, we define the notion of a library in
general and of a digital library specifically. We define a library as having six
major characteristics:
1. Collection of data objects - A library holds a collection of data objects,
also called holdings, items, resources, or just material. The items can be:
books and journals, documents (e.g., HTML pages), and multimedia
objects (such as pictures or images, tapes or video files, etc.). The library
objects can be available locally in the library, or indirectly, by using a
network to access them.
2. Collection of metadata structures A library contains a collection of
metadata structures, such as catalogs, guides, dictionaries, thesauri,
indices, summaries, annotations, glossaries, etc.
3. Collection of services - A library provides a collection of services, such as:
various access methods (search, browse, etc.) for different users,
management of the library, logging/statistics and Performance
Measurement Evaluation (PME) and Selective Dissemination of
Information (SDI) or as called Push mode.
4. Domain focus - A library has a domain focus and its collection has a
purpose. For example: art, science, or literature. Also, it is usually created
to serve a community of users, and therefore is finely grained. For
example: academic, public, special, school, national, or state library.
5. Quality control - A library uses quality control in the sense that all its
material is