devices will be accessible by a round trip signal in a
single clock cycle. Thus, any nanoscale chip will be
inherently balkanized into myriad sub domains of
local action; this is largely independent of the
architecture devised. To exploit all these domains, an
extremely high degree of application parallelism will
have to be available. In all likelihood, these will
require effective use of fine-grain algorithmic
parallelism which in turn requires very lightweight
mechanisms to minimize temporal overhead in the
management of these concurrent tasks and parallel
resources (again, within these local domains).
Coordination through synchronization must now be
very lightweight as well. Further, slight variations in
structure can cause signal skew. It will become very
difficult to achieve true synchronous operation. And
none of this takes into consideration main memory
which, if classical architectures were to be used,
would be thousands of clock cycles away. Waiting
for remote responses will dominate any use of
conventional architecture techniques.