There is still a certain degree of freedom for partitioning the processors into the subsets performing different tasks. We consider two possible scenarios. In the first, we ignore the time needed for each processor to read its necessary data, but consider the possibility of the use of some parameters which depend only on the index of the currently processed layer, and not on the particular chunk itself. This suggests a layout in which a processor is always assigned to perform a task at the same layer. In the second scenario, input operations are also being considered, which leads to a layout in which the assignment of new data to a processor is reduced to the possible minimum. In fact, a processor reads new data only after being done with the data that has been released.