4.2. Parallel and Distributed Execution Techniques
A number of techniques parallelize Java source code or bytecodes to improve the execution-time performance of the application. The parallelization typically is achieved through Java language-level support for multithreading. Thus, these techniques maintain the portability of the transformed parallel Java programs. Most of these techniques exploit implicit parallelism in Java programs for parallel execution on shared-memory multiprocessor systems using Java multithreading and synchronization primitives. Some approaches extend the Java language itself to support parallel and distributed Java programs. The performance improvement obtained when using these techniques depends on the amount of parallelism that can be exploited in the application program.
TheHigh Performance Javaproject [Bik and Gannon 1997; Bik and Gannon 1998] exploits implicit parallelism in loops and multiway recursive methods to generate parallel code using the standard Java multithreading mechanism. TheJAVAR [Bik and Gannon 1997] tool, which is a source-to-source restructuring compiler, relies on explicit annotations in a sequential Java program to transform a sequential Java source code into a corresponding parallel code. The transformed program can be compiled into bytecodes using any standard Java compiler. The JAVAB[Bik and Gannon 1998] tool, on the other hand, works directly on Java bytecodes to automatically detect and exploitimplicit loop parallelism. Since the parallelism is expressed in Java itself using Java’s thread libraries and synchronization primitives, the parallelized bytecodes can be executed on any platform with a JVM implementation that supports native threads.
TheJava Speculative Multithreading (JavaSpMT) parallelization technique [Kazi and Lilja 2000] uses a speculative thread pipelining execution model to exploit implicit loop-level parallelism on shared-memory multiprocessors for general-purpose Java application programs. Its support of control speculation combined with its run-time datadependence checking allows JavaSpMT to parallelize a wide variety of loop constructs, including do–while loops. JavaSpMT is implemented using the standard Java multithreading mechanism. The parallelism is expressed by a Java source-to-source transformation.
The Do!project [Launay and Pazat 1997] provides a parallel framework embedded in Java to ease parallel and distributed programming in Java. The parallel framework supports both data parallelism and control (or task) parallelism. The framework provides a model for parallel programming and a library of generic classes. Relevant library classes can be extended for a particular application or new framework classes can be defined for better tuning of the application.
Tiny Data-Parallel Java[Ichisugi and Roudier 1997] is a Java language extension for data-parallel programming. The language defines data-parallel classes whose methods are executed on a large number of virtual processors. An extensible Java preprocessor, EPP, is used to translate the data-parallel code into standard Java code using the Java thread libraries and synchronization primitives. The preprocessor can produce Java code for multiprocessor and distributed systems as well. However, the Tiny DataParallel Java language does not yet have sufficient language features to support high-performance parallel programs. DPJ[Ivannikov et al. 1997] defines a parallel framework through a Java class library for the development of dataparallel programs.
4.3. Dynamic Compilation
Since the compilation time in JIT compilation adds directly to the application’s total execution time, the quality of the code optimization is severely constrained by compilation speed. Dynamic compilation addresses this problem of JIT compilation by optimizing only the portions of the code that are most frequently executed, i.e., programhotspots. Most programs spend the majority of the time executing only a small fraction of their code. Thus, optimizing only the hotspot methods should yield a large performance gain while keeping the compilation speed relatively fast.
Sun’sHotspotJVM [The Java Hotspot Performance Engine Architecture] uses dynamic compilation to generate optimized native machine code during runtime. The Hotspot engine contains both a run-time compiler and an interpreter. The first time a method is executed, it is interpreted using a profiling interpreter that gathers run-time information about the method. This information is used to detect hotspots in the program and to gather information about program behavior that can be used to optimize generated native code in later stages of program execution. After the hotspot methods are identified, they are dynamically compiled to generate optimized native machine code. Infrequently executed code continues to be interpreted, decreasing the amount of time and memory spent on native code generation. Because a program is likely to spend the majority of its execution time in the hotspot regions detected by the interpreter, the compiler can spend more time optimizing the generated code for these sections of the program than a JITcompiler while still producing an overall improvement in execution time. During code generation, the dynamic compiler performs conventional compiler optimizations, Java specific optimizations, and inlining of static and dynamic methods. The inlining optimizations are designed to be reversible due to the problems associated with dynamic class loading.
IBM’sJalapenoJVM [Burke et al. 1999] includes an adaptive dynamic optimizing compiler that generates optimized machine code as the program is executed. The Jalapeno JVM does not use an interpreter. Instead, the first execution of a method is handled by quickly compiling a method into an unoptimized executable code. The dynamic optimizing compiler later generates optimized executable code from the bytecodes of the hotspot (i.e., frequently executed) methods as determined by runtime profile information.
The Jalapeno optimizing compiler first compiles Java bytecodes into a virtual register-based high-level intermediate representation (HIR). It then generates a control flow graph for the method. After applying compiler optimizations on the HIR, the intermediate executable is converted to a low-level representation (LIR) that includes references to specific implementation details, such as parameter passing mechanisms and object layouts in memory. The LIR is then optimized further and converted to a native code representation using a Bottom-Up Rewrite System [Proebsting 1992] approach. Finally, this machine specific representation (MIR) is optimized and assembled into machine executable code. The optimizing compiler performs inlining both during bytecode-to-IR translation and during HIR optimization. Inlining of virtual functions is handled by predicting the type of the virtual function and performing a run-time check to determine the validity of the prediction. If the prediction is incorrect, the program performs a normal virtual function invocation. Otherwise, the program proceeds with the inlined function.
The program adaptation in Jalapeno starts with the instrumentation and recompilation of executable code. The executable code is instrumented to gather context sensitive profile information to aid optimization. The profile information is used to detect program hotspots. When a certain performance threshold is reached, the dynamic optimizing compiler is invoked to recompile the hotspot methods using context specific optimizations at all levels of representations (HIR, LIR, and MIR). The unoptimized code is then replaced by optimized code based on the collected profile information. Program adaptation continues in this cycle, with executable code being improved on every optimization iteration.