Abstract— RoadRunneR-128 is a recently invented light weight,Feistel-type bit slice block cipher with a block size of 64 bits andkey length of 128 bits. RoadRunneR is specifically designed tooffer a better performance in resource constrained 8-bitplatforms. This cipher is highly optimised for implementation on8-bit CPUs with proven security against linear and differentialattacks. The paper deals with design and hardwareimplementation of a soft IP core for RoadRunneR-128 on FPGA.The paper then discusses the performance, resource utilizationand estimated power consumption of the design on ALTERADE1 cyclone II FPGA. It is inferred that the implemented designof RoadRunneR-128 is well suited for light weight platforms, andperforms at a maximum clock frequency of 272.18 MHz,throughput of 65 Mbps, and an efficiency of 0.

081 Mbps/slice.The work presented here, evidently outperforms its previoushardware implementations, since the invention of the cipher in2015. The implemented cipher is found to be lighter, and theperformance and the security are comparable with itscompetitors like AES, PRIDE and SPECK.Keywords— Bitslice cipher, Block ciphers, FPGA, IP core,Lightweight cryptography, RoadRunneR-128.I. INTRODUCTIONLOCK ciphers are the most popular ciphers used incryptographic applications. They have always beenproved to be more secure and reliable than stream ciphers.Designing complex and highly secured algorithms requiremore resources in terms of memory, computational capabilityand speed.

Designing block ciphers targeting resourceconstrained 8-bit CPUs is a challenging problem. There aremany recent lightweight ciphers designed for betterperformance in hardware with low computing capability.There is a lack of security proof or low security margin in thecase of other software efficient light weight ciphers. Hencethere is a need of a block cipher, which is efficient, highlysecured and has proven resistance against cryptographicattacks. RoadRunneR is an encryption algorithm developed in2015 by Adnan Baysal and Suhap Sahin. This light weightblock cipher is targeted at 8-bit platforms and its security isprovable against integral, differential and linear attacks 1.The paper structure is as follows. Section II describes the toplevel structure of RRR-128 (RoadRunneR-128).

The algorithmdesign and the detailed structure are presented in section III.Section IV presents the design and hardware implementationof the cipher. The cipher’s RTL design was done in Veriloglanguage. The details regarding the software, developed fortest vector generation and validation of results, are depicted insection V. Section VI presents the results regarding thethroughput, area, resource consumption and power dissipationof the implemented design on ALTERA DE1 CYCLONE IIFPGA.II. TOP LEVEL STRUCTURE OF RRR-128RoadRunneR-128 is a bit slice block cipher with 64-bitblock size and 128-bit key.

12 rounds of operation are requiredfor the encryption of a plain text. Fig. 1 represents the top levelstructure of RoadRunneR-128.Fig. 1. Top Level Structure of RoadRunneR-128Design and Implementation of IP Core forRoadRunneR-128 Block CipherMitha Raj1, Shinta Joseph K, JosemonTomy, Niveditha K S, Anna JohnsonDept. of Electronics and CommunicationJyothi Engineering CollegeThrissur, [email protected] RScientist ‘C’National Institute of Electronics andInformation Technology (NIELIT)Calicut, IndiaMitu RajCentre for Development ofAdvanced Computing (CDAC)Trivandrum, IndiaBThe encryption algorithm of the cipher is performed in the dataprocessing unit.

The data processing unit takes in a plaintextand a master key as its inputs. The n-bit block cipher willencrypt n-bit plain text into n-bit cipher text. Key generationunit will generate the whitening keys and round keys forrespective rounds. The initial and final whitening key of 32bits, and round keys of 96 bits are generated. The round keysfor each round are derived from the 128-bit master key.

Fig. 2. Figures of the Feistel Structure in RoadRunner-128, Internal Structureof Round Function F, Internal Structure of SLK Layer.

A. Detailed StructureFig. 2 shows the Feistel structure of the cipher, operationsinside the Round function, and the internal of SLK layer.

TheRound Function (F) of RoadRunner-128 has anSPN(Substitution-Permutation) structure of 4 layers. Itconsists of three SLK layers and one S layer. For each Roundfunction, round keys of 96 bits are generated from the 128-bitmaster key by the key generation unit. The SLK Layer consistsof Substitution layer(S), Diffusion layer (L) and Key addition(K). The permutation of bits is done before and after enteringthe S and L layers as shown in Fig. 2 to make the algorithmmore non-linear and thus ensuring more security. A roundconstant Ci is XORed with least significant byte of the wordafter the 2nd SLK block function as shown in Fig 2.

1) S-Box Layer (S):Bitslice S-boxes are widely used in light weightcryptography to reduce the size of look-up tables. Blockciphers such as PRESENT2, SEA3, PRIDE4,RECTANGLE5 and NOEKEON6 use bit slice S-boxeswith different S-box layer design strategies. The brute forceattack on various combinations was done and the 4×4 S-boxgiven in 7 was selected for RoadRunner-128 by theinventors, since it provides the least linearity or correlation.2) Diffusion Layer (L):The linear function used for diffusion is given by:L(x) = (x<<*
*

The F function has three key addition layers (onein each SLK layer) using 32-bit sub keys each, as shown in Fig3. The corresponding sub keys are XORed with the input inthis layer. The same round keys are repeated every four round.Initial and final whitening are done in first and last rounds.

III. COMPLETE STRUCTURE AND ALGORITHM DESIGNThe algorithm for designing RoadRunneR-128 is describedbelow.A. AlgorithmStep 1 64-bit plain text and 128-bit key are the inputs to thecipher.

The encryption requires 12 rounds (0 to11).Step 2 Both plain text and key are in Hexadecimal.Step 3 Split the plain text into left and right halves of 32 bitseach.Left half: x0||x1||x2||x3, Right half: x4||x5||x6||x7Step 4 The 128-bit key is split into sub-keys A||B||C||D of 32bits each.Step 5 The plain text’s left half is whitened using XORoperation with the sub-key A (known as “Initialwhitening”) 9.Step 6 Above result is given to Round Function F.Step 7 In round 0, we perform following operations as shownin Fig. 3, Fig.

4. and Fig.5.Fig. 3. Operations Inside the Round Function1) First SLK Operationa) Substitution (S)? The 32-bit input is divided into four bytes of 8 bits each.

? The bits of each byte are then permuted and distributedacross eight 4-bit S-boxes in the S-Layer as shown in Fig.4.? The 4-bit inputs to the S-boxes are then substituted withanother 4 bits corresponding to the values in the look-uptable given in Table I. Fig. 4. Inside the SLK-Layer TABLE I LOOK-UP TABLE FOR S-BOXES b) Diffusion Layer (L)? 32-bit output from S-Layer is divided into eight nibblesof 4 bits each.? The bits of each nibble are then permuted and distributedacross four 8-bit L-boxes in the Diffusion Layer or Llayer.

? Linear Function applied on each byte for diffusion is: L(x) = (x) ? (x<<<1) ? (x<<<2) (2)c) Key Schedule (K)? The output of the diffusion layer is XORed with the subkey B.? Three sub-keys are used in each round and they areselected using key scheduling given in Fig.1.

Fig. 5. Detailed Structure of Round Function2) Second SLK Operation? Perform the above 3 steps (S, L, K) by using next sub-key,C.? After the second SLK operation, round constant is XORedto the least significant byte (rightmost byte, i.

e., x3) of the32-bit output.? For round i = 0, 1, . . .

, NR ? 1, the round constant is Ci =NR ? i, where NR is the number of rounds, and Ci isrepresented as an 8-bit little endian integer. Table IIdescribes the Ci for corresponding rounds.TABLE II ROUND CONSTANTS FOR EACH ROUNDCi ValueC0 00001100C1 00001011C2 00001010C3 00001001C4 00001000C5 00000111C6 00000110C7 00000101C8 00000100C9 00000011C10 00000010C11 000000013) Third SLK Operation? Perform the same SLK operation with next sub-key, D.4) Final S Operation? Perform S-layer operation again on the above 32 bits andpermute the output finally.Step 8 The obtained result is the new left half.

It is thenXORed with the 32-bit right half from step1, to getthe new right half.Step 9 Swap both the halves and feed them to the next roundie; Round 1. The detailed operation inside the roundfunction is shown in Fig. 5.Step 10 Perform step 7,8,9 for remaining 11 rounds with thecorresponding sub keys defined for each round.Step 11 After round 11, the cipher text is finally obtained bywhitening (XORing) the calculated left half in theround 11, with sub-key B. (Note that no swapping isdone after the round 11).

Step 12 The left and right halves are then combined to get the64-bit cipher text corresponding to the input plain textand the master key.IV. RTL DESIGN AND HARDWARE IMPLEMENTATIONRTL design of RoadRunneR-128 was done in Verilog. TheVerilog HDL code is then simulated using ModelSim 10.1,and synthesized using Altera QUARTUS II for ALTERA DE1Cyclone II FPGA. The simulation and implementation details aregiven below.A. RTL Design and Simulation ResultsIn order to reduce the logic complexity and to make thedesign simpler and faster in performance, FSM based designapproach has been employed at RTL level.

It includespipelining of complex logics to multiple clock cycles 10.Fig. 6. Simulation of RoadRunner-128Fig. 6 shows the simulation result of the RoadRunner-128for test vectors: plain text 0xh and key 0xh.

The input clockfrequency is 100 MHz for simulation. The simulation is donein ModelSim simulator.B. Implementation ResultsAfter successful functional simulation, the design wassynthesized using Altera Quartus II software.Fig.

7. Implementation ResultsFig. 7 shows the FPGA implementation results for testvector 0x00, as obtained in the In-Memory content editor ofQUARTUS II after downloading the bit file into the FPGA.The RoadRunner-128 IP Core was implemented and tested onALTERA DE1 CYCLONE II FPGA. The RTL schematic andinterface signals of the designed Soft IP Core of RoadRunner-128 are shown in the Table III and Fig. 8.

Various results andreports regarding resource consumption, maximum frequencyof operation and power consumption were obtained aftersynthesis and place and route. They are discussed in sectionVI. TABLE III INTERFACE SIGNALS OF THE IP COREINTERFACESIGNALSDESCRIPTION BIT LENGTHclock Input ClockSignal1plaintext Input Plain Text 64key Input Master Key 128reset AsynchronousReset Input1start Chip Enable Input 1ciphertext Cipher TextOutput64done Status SignalOutput1Fig.

8. RTL Schematic of RoadRunner-128V. TEST VECTOR GENERATION TOOLA test vector generation tool has been developed forRoadRunner-128 to derive test vectors and validate the resultsobtained in the functional simulation.

Fig. 9. Test Vector GeneratorThis GUI based software for RoadRunner-128 wasdeveloped in Microsoft Visual Studio. Fig. 9 shows the toolalong with plain text and key as inputs, and cipher text as output. Besides the test vectors provided by the inventors ofthe cipher, more test vectors can be derived using this GUItool. The simulation results were verified against it.

Table IVillustrates some of the test vectors derived for RRR-128 usingthe tool.TABLE IV TEST VECTORS FOR ROADRUNNER-128VI. RESULTS AND COMPARISONSThe flow summary obtained after synthesizing the code inQuartus II for ALTERA DE1 cyclone II FPGA is shown inFig. 10. The number of logic elements or slices utilized for thedesign are 802 (2% utilisation). Timing was verified with amaximum clock frequency of operation, Fmax = 272.18 MHz,for slow corner model.

Power consumption was estimated tobe 140.74 mW after the power analysis of the design.The design takes 268 clock cycles to perform the encryptionof a 64-bit plain text for a given key. Hence, the totalencryption time to encrypt a plain text can be calculated as Ten? 0.98 µs, for Fclk = 272.18 MHz. Therefore, the throughputfor the implementation is obtained as 65 Mbps. Throughputper Area Efficiency (T/A Metric) of the implemented cipherhas been calculated as 0.

081 Mbps/slice. The FPGAimplementation of RoadRunner-128 algorithm presented inthis paper, is the first of its kind. A recent research workimplemented RRR-128 on 0.18µm CMOS Technology at 100kHz clock, with a very low throughput of 156 Kbps 11. AES(Advanced Encryption Standard) cipher is one of thecompetitors of RoadRunner-128. Research in 12 analyses theperformance of FPGA implementation of different AESalgorithms like AES-128, AES-192, AES-256 on XilinxVirtex-7 FPGA family. The throughputs are of the order of 1Gbps and area consumption is in the range of 12000-15000LUTs (Look Up Tables). Even though RoadRunner-128 is adifferent algorithm, and was implemented on a different FPGAarchitecture, the given implementation is observed to be morearea (in terms of Logic Elements) efficient than all the aboveimplementations.

However, the throughput is lesser for thegiven implementation. Triple DES (Data Encryption Standard)algorithm implemented on the same FPGA as RRR-128, has ahigher throughput of 3 Gbps. But the area consumption ismuch higher; around 40% utilisation 13. A similar lightweight algorithm implemented on the same FPGA, has similararea consumption, but the throughput is only 200 kbps 14.Table VII compares the security of RRR-128 with itscompetitors like AES, PRESENT, PRIDE and SPECK.

Cryptanalysis based on differential attacks, integral attacks,MITM (Meet In The Middle) attacks show that the maximumno. of rounds up to which RRR-128 can be attacked is only 6out of 12, which is the best among the given ciphers 15.Fig. 10.

Flow Summary TABLE VPOWER CONSUMPTION TABLE VI MAXIMUM FREQUENCY OF OPERATIONMaximum Clock Frequency of OperationFmax 272.18 MHzTABLE VII SECURITY COMPARISON WITH OTHER CIPHERSCipher Attacked RoundsRRR-128 6/12AES 7/10PRIDE 26/31SPECK-128 17/27PRESENT 26/31SIMON 26/42VII. CONCLUSIONIn this paper, we have presented the design andimplementation of a soft IP core for RoadRunner-128, a lightPLAIN TEXT KEY CIPHER TEXT0000_0000_0000_0002 8000_0000_0000_00000000_0000_0000_0000C168_C69A_C195_845E0010_0020_0030_0040 0000_0000_0000_00010000_0000_0000_00013109_48CF_D78E_57B40010_0200_0000_0000 0001_0000_0000_00010001_0000_0000_000152BB_4E1A_331D_91BFFEDC_3210_0002_0000 0123_4567_0000_CDEF0123_4567_0000_CDEFE45B_1D93_75E2_73641000_1002_5000_4000 1111_2222_3333_44441111_2222_3333_44440DF2_9A4F_C5BF_5BFF1023_2050_1147_8124 1000_4000_5000_22221000_4000_5000_2222BB76_8D15_1B18_616FPower DissipatedTotal Thermal PowerDissipation140.74 mWI/O Thermal PowerDissipation60.72 mWCore Static Thermal PowerDissipation80.03 mWweight block cipher. The results obtained for various testvectors have been successfully verified against the originalresults of the inventors of the algorithm.

The implementeddesign is compromised between both speed and area.Compared to its competitors, the RRR-128 core performsefficiently with lesser area utilization, reasonable throughputand proven security. It is also found that the presented workoutperforms its previous implementation, by over 400 times interms of throughput. Hence the implemented cipher is wellsuited for light weight applications. The FPGAimplementation of RoadRunner-128 Block Cipher, presentedin this work is the first of its kind. Further research in thefuture may improve the performance and efficiency of thecurrent design by implementing techniques like optimumpipelining and loop unrolling.ACKNOWLEDGMENTThe authors would like to thank the inventors of thealgorithm, Mr.

Adnan Baysal and Mr.Suhap Sahin for theirvaluable feedbacks and timely help.REFERENCES1 Adnan Baysal, Suhap Sahin, “RoadRunneR: A Small and Fast BitsliceBlock Cipher for Low Cost 8-bit Processors,” presented at LightSec2015, Bochum, Germany, September 2015.Available: https://eprint.iacr.org/2015/9062 Andrey Bogdanov et al, “PRESENT: An Ultra-Lightweight BlockCipher,” in Lecture Notes in Computer Science, vol. 4727. Springer,2007, pp.

450–466.3 François-Xavier Standaert et al, “SEA: A scalable encryption algorithmfor small embedded applications,” in Lecture Notes in ComputerScience, vol. 3928. Springer, 2006, pp. 222–236.4 Martin R.

Albrecht et al, “Block ciphers – focus on the linear layer(featuring PRIDE),” in Proc. CRYPTO 2014 – 34th Annual CryptologyConference, Santa Barbara, CA, USA, August 17-21, 2014, pp. 57–76.5 Wentao Zhang et al, “RECTANGLE: A bit-slice ultra-lightweight blockcipher suitable for multiple platforms,” IACR Cryptology ePrint Archive2014:84, 2014. Online. Available: https://eprint.iacr.

org/2014/084.6 Joan Daemen, Michaël Peeters, Gilles Van Assche, and Vincent Rijmen,”Nessie proposal: Noekeon”. Online. Available: https://gro.

noekeon.org/Noekeon-spec.pdf.7 Markus Ullrich et al, “Finding optimal bitsliced implementations of 4×4-bit s-boxes,” in Proc. SKEW 2011 Symmetric Key EncryptionWorkshop, Copenhagen, Denmark, June 2011, pp. 16-17.

8 Vincent Grosso, Gaëtan Leurent, François-Xavier Standaert, and KeremVarici, “Ls-designs: Bitslice encryption for efficient masked softwareimplementations,” in Proc. 21st International Workshop, FSE 2014,London, UK, March 3-5, 2014, pp. 18-37.

9 Pierre-Alain Fouque and Pierre Karpman, “Security Amplificationagainst Meet-in-the-Middle Attacks Using Whitening,” in Proc. IMAInternational conference on Cryptography and Coding, Oxford, UK,2013, pp. 252-269.10 F. Ferrandi, P. L. Lanzi, G. Palermo, C.

Pilato, D. Sciuto and A. Tumeo,”An Evolutionary Approach to Area-Time Optimization of FPGAdesigns,” in Proc. 2007 International Conference on EmbeddedComputer Systems: Architectures, Modeling and Simulation, Samos,2007, pp. 145-152.11 J. Liu, G.

Bai and X. Wu, “Efficient Hardware Implementation ofRoadrunner for Lightweight Application,” 2016 IEEETrustcom/BigDataSE/ISPA, Tianjin, 2016, pp. 224-227.12 N.

S. S. Srinivas and M. Akramuddin, “FPGA based hardwareimplementation of AES Rijndael algorithm for Encryption andDecryption,” 2016 International Conference on Electrical, Electronics,and Optimization Techniques (ICEEOT), Chennai, 2016, pp. 1769-1776.13 Del Rosal, Edni and Kumar, Sanjeev, “A Fast FPGA Implementation forTriple DES Encryption Scheme,” in Circuits and Systems, vol.

8, no. 10,pp. 237-246, August 2017.14 Chanthini Baskar, C. Balasubramaniyan and D.

Manivannan,”Establishment of Light Weight Cryptography for Resource ConstraintEnvironment Using FPGA,” in Procedia Computer Science, vol. 78, pp.165-171, 2016.15 Celine Blondeau and Kaisa Nyberg, “Links between truncateddifferential and multidimensional linear properties of block ciphers andundelying attack complexities,” in Proc.

Advances in Cryptology –EUROCRYPT 2014 – 33rd Annual International Conference on theTheory and Applications of Cryptographic techniques, Copenhagen,Denmark, May 11-15, 2014, pp. 165-182.