As shown in Fig. 1, the SPCU consists of multiple PEs such that each PE corresponds to a node in the graph. Each PE parallely receives the instruction code from the host and the input data from the bi-directional data bus. However, at any given time only one PE can output data into the data bus.