The compiler first performs a linear scan of the bytecodes to record stack and variable information for use in later stages. This is followed by a global register allocation, code generation, code emission, and code patching pass. The compiler uses two alternative schemes for global register allocation. The first scheme allocates the 4 callee-saved registers to the variables with the highest static reference counts for the duration of the method. The second scheme iterates through all of the variables in order of decreasing static reference counts, allocating a register to a variable if the register is available in all of the basic blocks that the variable is referenced.
CSE is implemented by matching nonoverlapping subsequences in expressions. Basic blocks are represented as a string of bytes where common subexpressions are detected as duplicate substrings in the string. Any two matched substrings of less than sixteen bytecodes is considered as a candidate for elimination. If the value calculated during the first expression instance is still available during the second instance of the expression, the previous value is reused and the duplicate expression is eliminated. In this scheme, transitivity of expressions is not considered. Therefore, the expression “xCy” would not match with “yCx.”
Unnecessary array bounds checks are detected by keeping track of the maximum constant array index that is bounds checked and eliminating index bound checks for constant values less than the maximum constant array index check. For example, if array location 10 is accessed before location 5, the bounds check for array location 5 is eliminated. This optimization is useful during array initialization since the array creation size is considered a successful bounds check of the highest array index. This eliminates bound checks for any initialization of array values using constant array indices. However, elimination of bounds checks does not apply to variables nor does it extend beyond the scope of a basic block.
Code for handling thrown exceptions is moved to the end of the generated code for a method. By placing exception handling code at the end of the method, the static branch predictor on Pentium processors will predict the branches to be not taken and the exception handling code is less likely to be loaded into a cache line. Since exceptions do not occur frequently, this is well-suited for the common case execution.
The Open JIT project [Matsuoka et al. 1998] is a reflective JIT compiler written in Java. The compiler is reflective in the sense that it contains a set of selfdescriptive modules that allow a user program to examine the internal state of the compiler’s compilation and modify the state through a compiler-supplied interface. This interface allows user programs to perform program-specific optimizations by defining program-specific semantics. For instance, defining properties of commutivity and transitivity for a user object provides the compiler with more information to perform useful optimizations.
The open-source Kaffe project provides a JIT compiler for the Kaffe JVM that supports various operating systems on the x86, Sparc, M68k, MIPS, Alpha, and PARisc architectures [Wilkinson, Kaffe v0.10.0]. The Kaffe JIT compiler uses a machine-independent front-end that converts bytecodes to an intermediate representation called the KaffeIR. The KaffeIR is then translated using a set of macros which define how KaffeIR instructions map to native code.
The AJIT compilation systemgenerates annotations in bytecode files to aid the JIT compilation [Azevedo et al. 1999]. A Java to bytecode compiler generates annotations that are stored as additional code attributes in generated class files to maintain compatibility with existing JVMs. These annotations carry compiler optimization-related information that allow the JIT compiler to generate optimized native code without extensive runtime analysis. An example of a generated attribute in the AJIT system is the mapping of variables to an infinite virtual register set. This virtual register allocation (VRA) annotation is used by an annotation-reading JIT compiler to speed up register allocation and to identify unnecessary duplication of stack variables.
CACAOis a stand-alone JIT compiler for the DEC ALPHA architecture [Krall and Grafl 1997]. The CACAO compiler translates bytecodes to an intermediate representation, performs register allocation, and replaces constant operands on the stack with immediate instruction operands.
Fajita[FAJITA] is a variation of a JIT compiler that runs as a Java compilation server on a network, independent of the Java runtime. The server compiles Java bytecode as a pass-through proxy server, allowing class compilation to be cached and shared among different programs and machines. Since the lifetime of compiled code in this environment is generally much longer than a standard JIT, this compiler has more time to perform advanced optimizations.