Most burtons just pass through the cell on their
way to some destination determined by the location
of a target user variable or data structure. The cell
structure of CCA serves as a massive mesh
communication fabric. Burtons are wormhole routed
through the sequence of cells. The head of the burton
indicates its type and the logical entity for which it is
searching as well as the physical destination (or
direction) it is moving towards/in. As the burton
header enters a cell, the cell logic decides what action
to take and may do an associative search (probably a
single cycle) on the small stack of tagged data blocks
to determine if the target variable is local. If not, it
will pass the burton in the direction it is going with
minimum delay. However, it may cause some
rerouting to occur due to either implicit traffic control
(avoiding conflicts using adaptive routing strategies)
or explicit routing updates (through “crumbs” –
intermediate points of redirection for search) for a
given variable or block of data which may have
moved from an assumed location. The cell contains
sufficient control logic and data paths to carry out
these communication actions.
The cell also supports more sophisticated
operations on data for purposes of fine grain
synchronization and task management. This is to
realize the semantics of the futures and dataflow
constructs which may involve multiple contiguous
cells. These actions operate on the synchronization
information embedded as part of the cell data blocks.
Other functions enable garbage collection, burton
creation, and advanced address management. Each
cell is constantly “aware” of its neighbor cells and
together accomplishes certain basic functions like
data migration to make room for new data and to
manage locality for improved performance through
reduced latency. A major goal of this project is to
determine and specify the precise set of capabilities
and their logic level realization within the boundaries
of each cell.
Since the physical resources encompassed by a
domain for which the W factor does not exceed will
likely be sufficient to build a single cell, all intra-cell
operations can be executed synchronously. This has
an added benefit of reusing the layout and logical
design tools available today and, due to removal, or
at least significant reduction of the number of
pipelined blocks, cuts down on resources not directly
related to the functionality of the cell. By contrast,
the inter-cell communications are expected to cross
the W = 1 boundary and thus require an explicitly
asynchronous design.