is found to be John.
Another level of analysis is called semantic analysis. In contrast to the
parsing process, which merely identifies the grammatical role of each word,
semantic analysis is charged with the task of identifying the semantic role of
each word in the statement. Semantic analysis seeks to identify such things as
the action described, the agent of that action (which might or might not be the
subject of the sentence), and the object of the action. It is through semantic
analysis that the sentences “Mary gave John a birthday card” and “John got a
birthday card from Mary” would be recognized as saying the same thing.
A third level of analysis is contextual analysis. It is at this level that the
context of the sentence is brought into the understanding process. For example,
it is easy to identify the grammatical role of each word in the sentence
The bat fell to the ground.
We can even perform semantic analysis by identifying the action involved as
falling, the agent as bat, and so on. But it is not until we consider the context of the
statement that the meaning of the statement becomes clear. In particular, it has a
different meaning in the context of a baseball game than it does in the context of
cave exploration. Moreover, it is at the contextual level that the true meaning of
the question “Do you know what time it is?” would finally be revealed.
We should note that the various levels of analysis—syntactic, semantic, and
contextual—are not necessarily independent. The subject of the sentence
Stampeding cattle can be dangerous.
is the noun cattle (modified by the adjective stampeding) if we envision the cattle
stampeding on their own. But the subject is the gerund stampeding (with object
cattle) in the context of a troublemaker whose entertainment consists of starting
stampedes. Thus the sentence has more than one grammatical structure—which
one is correct depends on the context.
Another area of research in natural language processing concerns an entire
document rather than individual sentences. Here the problems of concern fall
into two categories: information retrieval and information extraction.
Information retrieval refers to the task of identifying documents that relate to
the topic at hand. An example is the problem faced by users of the World Wide
Web as they try to find the sites that relate to a particular topic. The current state
of the art is to search sites for key words, but this often produces an avalanche of
false leads and can overlook an important site because it deals with “automobiles”
instead of “cars.” What is needed is a search mechanism that understands
the contents of the sites being considered. The difficulty of obtaining such
understanding is the reason many are turning to techniques such as XML to produce
a semantic Web, as introduced in Section 4.3.