Consider therefore the following more involved indexing scheme, assigning the processors according to their index in such a way that only n, that is, about half of the processors, have to read new data at each time step, which is the possible minimum as at each time step, anew data chunk is accessed. The remaining n −1 processors stay with the data they have read when they have been assigned to layer 0. This implies that there is no delay caused by input commands during the log n consecutive steps required to process the chunk in layers. The following explanation corresponds to the general case, not the initial log n chunks.