1 Introduction
When designing digital embedded system to be used in cellular phones, laser printers, etc., performance is a critical issue. A well-known technique to meet perfor- mance constraints is software speed-up. This is illustrated in figure 1. If the application cannot comply with perfor- mance constraints when implemented solely on the pro- cessor, time-consuming parts of the application are ex- tracted and executed on dedicated hardware, the ASIC. The target architecture for this type of software speed- up is co-processor based, i.e. a single processor and one or more ASICs. This type of target architecture has suc- cessfully been used for application speed-up in different co-synthesis systems such as COSYMA [2], Vulcan [3] and the LYCOS system [9]. The ASIC is implemented as a data-path composed of functional units such as adders, multipliers, etc., and a controller that controls the compu- tation in the data-path. This is shown in figure 1, which also shows that the hardware data-path in this target ar- chitecture is composed of two adders, one subtractor and one multiplier. We call this an allocation of hardware resources; two adders, one subtractor and one multiplier have been allocated to the hardware data-path.
In the LYCOS system, software speed-up is achieved by partitioning the application onto the preselected target ar- chitecture using the PACE algorithm [7]. Input to the par- titioning tool is the application and the before mentioned
target architecture. The hardware/software partitioning re- sults in a mapping of non-time critical parts of the appli- cation to software (i.e. the processor in the target architec- ture) while the most time critical parts of the application are mapped to the ASIC in order to achieve the software speed-up. As figure 1 shows, the target architecture must be fixed before the partitioning can take place. This in- cludes selecting the processor and allocating the type and number of hardware resources to the hardware data-path (a memory mapped communication scheme between hard- ware and software is assumed).
This paper presents a technique that, prior to parti- tioning, allocates the hardware resources for the hardware data-path. This is a key aspect in the process of achieving the best possible speed-up after the hardware/software par- titioning has been done. The preallocation of the data-path resources is done taking characteristics of the application into account, knowing that the application subsequently will be partitioned between hardware and software. The allocations generated by the algorithm comes very close to the optimal allocations. An optimal allocation will ensure that the hardware/software partitioning generated by the PACE algorithm gets maximum speed-up. However, find- ing the optimal partition for a given application (manually or by exhaustive search) is an extremely time-consuming task due to the very large number of different allocations.