Parsing the Program into Parse Trees
Except for those programs having syntax errors, all programs will be converted
into parse trees by a parser before making any comparison. The parser ignores
the comments, import statements, white spaces and line breaks in the program.
A parse tree is composed of nodes and tokens. The structure of the parse tree is
well defined. Each parse tree represents a single complete program. All the
essential data of the program will be stored in the parse tree. Figure 1 shows the
conversion of a simple Java program into a parse tree.
Different kinds of nodes in the parse tree represent different parts in the
program, for example, UnmodifiedClassDeclaration stands for the class
declaration of the program (in Figure 1). The parser used to parse the program
into parse tree is language-dependent. A parser requires a set of grammar that
describes the rules for how the program can be constructed. Apart from
grammar, a Java-based parser generator is required to generate a language
Figure 1. Example of converting a Java Program into a parse tree.
specific parser. JavaCC (WebGain 2000; Lee 2002) is used as our parser
generator.