continue to shrink, global interconnect wire delay does not scale accordingly with technologies. The increasing wire delays have become one major impediment for performance improvement. Compared to a traditional two dimensional chip design, one of the important benefits of a 3D chip over a traditional two-dimensional (2D) design is the reduction on global interconnects. It has been shown that three-dimensional architectures reduce wiring length by a factor of the square root of the number of layers used [16]. The reduction of wire length due to 3D integration can result in two obvious benefits: latency improvement and power reduction. For example, since interconnects dominate the delay of cache accesses which determines the critical path of a microprocessor, and the regular structure and long wires in a cache make it one of the best candidates for 3D designs, 3D cache design is one of the early design example for fine-granularity 3D partition [17],and the latency reduction can be as much as 25% for a twolayer 3D cache. 3D arithmetic-component designs also show latency benefits. For example, various designs [18] have shown that the 3D arithmetic unit design can achieve around 6%-30% delay reduction due to the wire length reduction.