the S-box and inverse S-box using logical operations instead of LUTs. In our rearrangement, 4 lanes within one
row respectively correspond to the 4 input-outputs of S-boxes, thus 8 S-boxes can be computed in parallel using a
logical instruction sequence operating on 8-bit registers, since 2 lanes share one register.
We firstly managed to find the bitsliced implementation of the 44 S-box (resp. inverse S-box) of PRINCE
using an automatic search tool [41]. Operations in the resulted bitsliced implementations use the ‘operator destination,
source1, source2’ instruction format. While in AVR ATtiny, instructions destination register is one of
the source register, i.e. it uses ‘operator destination, source’ instruction format. Thus, we translate the primary
instruction sequences into two-operator instruction sequences manually. In our translation, we try to minimum the
required clock cycles and realize in placed process, i.e. the outputs are in the same registers as the inputs.
The primary bitsliced implementation of the S-box (resp. inverse S-box) of PRINCE need 17 (resp. 16) terms.
Translating into AVR instructions, it turns to be an instruction sequence with length of 17+4 = 21 (resp. 16+
6 = 22) with 4 (resp. 6) additional copy register (MOV) instructions. Taking advantage of the copy register pair
(MOVW) instruction, and processing 16 S-boxes together instead of 8 S-boxes, the S-layer (inverse S-layer) of
PRINCE needs 172+4 = 38 (resp. 162+6 = 38) instructions, instead of 212 = 42 (resp. 222 = 44).
1 this name is borrowed from names of KECCAK-f state parts [42]