The I/O clock is distributed as a clock only within clusters by
re-timing to l2clk a phase signal (io_phase) which toggles at a
reduced CMP rate. Pipelining the common phase is cheaper and
more efficient than a custom top level distribution. The cluster
headers perform re-timing, clock gating, muxing, and related
DFT functions before driving clock grids using pre-grid drivers.
In short, iol2clk is derived from l2clk within any cluster and
hence the iol2clk-l2clk skew is comparable to l2clk-l2clk skew
across clusters. On the other hand, the latencies of CMP and
DR clocks are loosely matched and therefore may exhibit large
inter-domain skew in a single MCU. Large skew in this context
is defined as skew approaching CMP cycle times.