Thread synchronization is a potential performance problem in many Java programs that use multithreading. Since Java libraries are implemented in a thread-safe manner, the performance of even singlethreaded applications may be degraded due to synchronization. In Java, synchronization is provided through monitors, which are language-level constructs used to guarantee mutually-exclusive access to shared data-structures [Silberschatz and Galvin 1997]. Unfortunately, monitors are not efficiently implemented in the current Sun JDK. Since Java allows any object to be synchronizable (with or without any synchronized methods), using a lock structure for each object can be very costly in terms of memory. To minimize memory requirements, the Sun JDK keeps monitors outside of the objects. This requires the run-time system to first query each monitor in a monitor cache before itis used, which is quite inefficient. Further, the monitor cache itself needs to be locked during these queries to avoid race conditions. Thus, this monitor implementation approach is not scalable. CACAO, in contrast, is a system that implements monitors using a hash table indexed by the address of the associated object [Krall and Proebst 1998]. Because only a small number of objects that require mutually-exclusive access (called a mutexobject) are locked at any given time, this approach generally leads to decreased memory usage. Each of these objects maintains a count of how many lock operations have been performed on it without any corresponding unlock operations. When the count goes to zero, the lock is not owned by any thread. A mutex object, however, is not destroyed when its count reaches zero under the assumption that the mutex object will be reused again soon. The mutex object is destroyed only when a new mutex object collides with it in the hash table. This memory allocation approach has the advantage of requiring less code to implement lock and unlock operations. Since the hash table entries will be free only at the beginning of the program and will never be freed again, the code to lock the mutex need not check if the entry is free. The unlocking code also does not need to check whether the hash entry should be freed. Thethin locks approach in IBM’s JDK 1.1.2 for AIX improves thread synchronization by optimizing lock operations for the most common cases [Bacon et al. 1998].Thin locks, which require only a partial word per object, are used for objects that are not subject to any contention, do not have wait/notify operations performed on them, and are not locked to an excessive nesting depth. Objects that do not meet the criterion usefat locks, which are multiword locks similar to those used in the standard Java implementation. To minimize the memory required for each object’s lock, 24 bits in the object’s header are used for the lock structure. The other values stored in the header are compacted using various encoding schemes to make the 24 bits available for this lock structure. This 24-bit field stores either a thin lock directly, or a pointer to a fat lock. The dedicated lock for each object allows the lock/unlock operations on the thin locks to be performed using only a few machine instructions. This approach also eliminates the need to synchronize the monitor cache.