Cache is designed in multiple levels. Level 1 cache is "closest" to the CPU and suppost the fastest access. Level 2 cache is typically larger and ab bit slower while Level 3 cache, it used, is used, is larger and possibly a bit slower still. Level 1 cache is a small block (typically around 64 KB); level 2 cache is larger (between 512 KB and 2 MB).
Cache design becomes even more important with multi-core CPUs as multiple components are contending for use of a single resource (system memory). At each level, cache can either be discrete (available to one core only) or shared (available to all cores) depending on the processor model.
Addressing
The system bus between the CPU and memory consists of a data bus and an address bus. The width of the data bus (64-bit on all current CPUs) determines how much data can be transferred per clock cycle; the width of the address bus determines how many memory locations the PC can access.
The address bus for most 32-bit CPUs is either 32- ro 36-bits wide. A 32-bit address bus can access a 4 GB address space; 36-bit expands that to 64 GB. In theory, a 64-bit CPU could implement a 64-bit address space (64 Exabytes). In practice. the current generation of x64 CPUs are "restricted" to 40-bit address spaces (1 TB) to reduce the complexity in remaining compatible with 32-bit software.
Other CPU Features
Despite the architectural features discussed above, the speed at which the CPU runs is generally seen as a key indicator of performance. This certainly true when comparing CPUs with the same architecture but is not necessarily the case otherwise. Intel Core 2 CPUs run slower than Pentium 4s, but deliver better performance.
Clock Speed and Overclocking
The core clock speed is the speed at which the CPU runs internal processes and accesses L1 and L2 cache. The Front Side Bus speed is the interface between the CPU and system memory.
Overclocking increases the clock speed, improving performance. When a manufacturer releases a new chip, it sets an optimum clock speed based on systems testing. This clock speed will be set at a level where damage to the chip is not likely to occur during normal operation.