To appreciate this, we suggest a new parameter,
W
(tau), that quantitatively reflects the disparate
properties of future new nanoscale devices.
W
is the
ratio of the number of gates on a chip to the number
of gates through which a signal can propagate round
trip in a single clock cycle. This does not assume a
linear sequence of gates but a two dimensional set
with a radial distribution. Thus, the parameter
W
may
be as small as 1 where all gates on a chip (or module)
can be accessed. For a modern CMOS
microprocessor,
W
can be approximately 10 or
slightly greater. But even today, experimental
superconductor Rapid Single Flux Quantum (RSFQ)
logic fabricated with niobium technology with clock
rates > 100 GHz exhibit a
W
of approximately 1000.
Future nanoscale technologies when fully realized
will deliver a
W
rating of between ten thousand and a
million depending on a number of implementation
specific details. Architectural ideas that work for
processors in the 1 to 10
W
range will fail when
W
soars to five orders of magnitude or more. The
purpose of the proposed research is to develop radical
architecture structures that will enable nanoscale and
other advanced technologies to effectively perform
general purpose applications. It is expected that
should this research be successful, the proposed
architecture concepts in synergy with future
nanoscale device technology will yield a new
generation of computing systems capable of multiple
Exaflops of sustained performance.
A set of architecture concepts collectively
referred to as “Continuum Computer Architecture”
(CCA) so named because it approximates an ideal
continuous space, continuous time medium of
execution, but in discretized form has been derived to
address the challenges of nano-scale technologies and
the end of Moore’s Law. CCA is both a parallel
model of computation and an implicit highly parallel
hardware structure incorporating local mechanisms
invented to enable efficient performance of the CCA
execution model. CCA is intended to operate
effectively in the high
W
regime. Its structures reflect
the locality of action in two ways. A site of
instruction execution has very few resources but all
that is necessary for a single instruction to be
performed. CCA dispenses with the concept of “the
processor” and instead merges logic, communication,
and state storage into a single physical element. The
second method that addresses the expected
operational regime is that all such elements are
message-driven and perform split-transaction
execution. An element performs an action upon the
incidence of a packet called a parcel that wants to
access and potentially modify the element’s local
state using the element’s own local logic. For those
familiar with classic cellular automata, CCA would
appear to be superficially similar. Both exploit
locality of action based on local rules and local or
nearest neighbor state. Both achieve an emergent
global behavior from the effective synergy of myriad
simple interacting local elements. But where the
effect of cellular automata is often the special
purpose mimicking of some physical phenomenon
like thermal diffusion through a gaseous medium or
some more abstract effect like the game of life, the
effect of CCA through the symbiosis of its interacting
elements is a global general purpose parallel
computing discipline to govern the execution of any
general problem.
A more detailed description of a possible CCA
model of execution and logical structure that supports
it is presented later. But here we identify some of the
critical benefits anticipated of Continuum Computer
Architecture for nanoscale technology:
x
Organizes all computing actions in local
domains of action
x
Exposes mammoth memory bandwidth to
overcome memory wall
x
Permeates the system with arithmetic-logic
units to eliminate sources of potential
contention and latency increases inherent in
typical centralized approaches
x
Exploits and exposes the full potential of
nanoscale processing capability
x
Employs an asynchronous message-driven
computing model for split-transaction
execution and latency hiding
To appreciate this, we suggest a new parameter,
W
(tau), that quantitatively reflects the disparate
properties of future new nanoscale devices.
W
is the
ratio of the number of gates on a chip to the number
of gates through which a signal can propagate round
trip in a single clock cycle. This does not assume a
linear sequence of gates but a two dimensional set
with a radial distribution. Thus, the parameter
W
may
be as small as 1 where all gates on a chip (or module)
can be accessed. For a modern CMOS
microprocessor,
W
can be approximately 10 or
slightly greater. But even today, experimental
superconductor Rapid Single Flux Quantum (RSFQ)
logic fabricated with niobium technology with clock
rates > 100 GHz exhibit a
W
of approximately 1000.
Future nanoscale technologies when fully realized
will deliver a
W
rating of between ten thousand and a
million depending on a number of implementation
specific details. Architectural ideas that work for
processors in the 1 to 10
W
range will fail when
W
soars to five orders of magnitude or more. The
purpose of the proposed research is to develop radical
architecture structures that will enable nanoscale and
other advanced technologies to effectively perform
general purpose applications. It is expected that
should this research be successful, the proposed
architecture concepts in synergy with future
nanoscale device technology will yield a new
generation of computing systems capable of multiple
Exaflops of sustained performance.
A set of architecture concepts collectively
referred to as “Continuum Computer Architecture”
(CCA) so named because it approximates an ideal
continuous space, continuous time medium of
execution, but in discretized form has been derived to
address the challenges of nano-scale technologies and
the end of Moore’s Law. CCA is both a parallel
model of computation and an implicit highly parallel
hardware structure incorporating local mechanisms
invented to enable efficient performance of the CCA
execution model. CCA is intended to operate
effectively in the high
W
regime. Its structures reflect
the locality of action in two ways. A site of
instruction execution has very few resources but all
that is necessary for a single instruction to be
performed. CCA dispenses with the concept of “the
processor” and instead merges logic, communication,
and state storage into a single physical element. The
second method that addresses the expected
operational regime is that all such elements are
message-driven and perform split-transaction
execution. An element performs an action upon the
incidence of a packet called a parcel that wants to
access and potentially modify the element’s local
state using the element’s own local logic. For those
familiar with classic cellular automata, CCA would
appear to be superficially similar. Both exploit
locality of action based on local rules and local or
nearest neighbor state. Both achieve an emergent
global behavior from the effective synergy of myriad
simple interacting local elements. But where the
effect of cellular automata is often the special
purpose mimicking of some physical phenomenon
like thermal diffusion through a gaseous medium or
some more abstract effect like the game of life, the
effect of CCA through the symbiosis of its interacting
elements is a global general purpose parallel
computing discipline to govern the execution of any
general problem.
A more detailed description of a possible CCA
model of execution and logical structure that supports
it is presented later. But here we identify some of the
critical benefits anticipated of Continuum Computer
Architecture for nanoscale technology:
x
Organizes all computing actions in local
domains of action
x
Exposes mammoth memory bandwidth to
overcome memory wall
x
Permeates the system with arithmetic-logic
units to eliminate sources of potential
contention and latency increases inherent in
typical centralized approaches
x
Exploits and exposes the full potential of
nanoscale processing capability
x
Employs an asynchronous message-driven
computing model for split-transaction
execution and latency hiding
การแปล กรุณารอสักครู่..
