1. Motivation
Information filtering (IF) systems are designed for
permanently scanning document streams (e. g. newsticker
or Usenet). They identify potentially important or
interesting documents for users by classifying them. For
designers of IF systems the conceptualization of user
profiles is a great challenge. For each user a profile has to
be defined. This profile determines criteria which are
utilized for user-specific classification of documents.
One approach (the explication approach) is the
definition of a formalized language which is utilized by
the user to describe his profile. This approach has been
implemented in the IF prototypes Rama [Binkley 1991],
Borges V2 [Smeaton 1996] and Sift [Yan 2000]. The
main problem with this approach is that users are often
not capable of specifying their information demand
properly. There are two main reasons for this: Firstly, it is
difficult for a user to explicate required criteria. Secondly,
the formalized language has to be powerful enough to
deal with the challenges of natural language processing
like flexions of words1, synonyms2 and polysems3. Addi-tionally, it should be powerful enough to allow complex
expressions (e. g. using Boolean operators like “or”,
“and”, “not”, etc.). On the one hand this leads to a huge
amount of time the user needs to master the language in
case it is very powerful. On the other hand the filtering
results of the IF system will be deficient if the language is
easy to use but not powerful enough.
One solution of this dilemma is the use of an adaptive
approach. The idea is to present some evaluated
documents to the IF system and to let it generate the user
profile on its own. As a side-effect the system can
improve the user profile continuously if the user himself
gives a feedback on misclassified documents. This
approach has already been implemented in NewsSIEVE
[Haneke 2001] and PI-Agent [Kuropka 2001] systems.
NewsSIEVE adapts the user profile by using evolutionary
algorithms while the PI-Agent uses neuronal networks.
Both approaches have in common that the initial
information about user profiles (= training set) are
transformed into an internal representation (e. g. neuron
weights in case of a neuronal network) which makes the
profile representation difficult to understand for users. So
the system is not able to explicate its classification rules
in a user-friendly way. This leads to the following
problems: Firstly, the user has to rely on the classification
given by the IF system without knowing how the
classification is done in detail. Secondly, in case the
user’s information demand shifts from one day to another
or the system is unable to adapt his information demand,
it is impossible for him to make reasonable corrections on
his profile. Consequently, the user has to wait until the
system has corrected his profile automatically.
Meanwhile, a lot of documents may be misclassified.
Our intention is the use of a case-based approach for
defining user profiles. This means, the user defines his
profile by presenting some evaluated documents to the
system, like in the adaptive approach. In contrast to the