Available online at www.sciencedirect.com

SciVerse ScienceDirect

Procedía Engineering 29 (2012) 2848 - 2852

Procedía Engineering

www.elsevier.com/Iocate/procedia

2012 International Workshop on Information and Electronics Engineering (IWIEE)

Block RAM Based Design of 8-bit AES Operation Modes

Chi-Wu Huang, Hong-You Chen , Hsing-Chang Yeh, Chi-Jeng Chang

Dept. of Applied Electronics Technology., National Taiwan Normal University, Taipei, Taiwan

Abstract

8-bit AES implementation was first proposed in 2006 as Application Specific Instruction Processer (ASIP).[1] It featured in low area design, for the increasing popular applications in wireless and embedded devices, based on the stored-program concept which the software programs run in an FPGA processor. This paper presents a direct FPGA implementation, in which the circuit areas varied depend on the algorithms and the implementing methods used. Our specific 8-bit implementation using three Block RAMs (BRAMs) achieves 88 slices in CFB/OFB mode and 134 slices in ECB mode which are better than or very close to 122 slices achieved by ASIP. The throughputs over 31 Mega bit per second (Mbps) are at least 14 times higher than 2.18 Mbps achieved by ASIP.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University of Science and Technology

Keyword: AES, ECB, CFB, OFB, Block RAM(BRAM) ;

1. Introduction

The Advance Encryption Standard (AES) was announced by the National Institute of Standard and Technology (NIST) in 2001.[1,2] It is a symmetric block cipher that is intended to replace DES as the approved standard for a wide range of applications, the 128-bit data path AES is usually for high-speed web-server.[3] The 32-bit AES presented in 2003 for low power and low area design.[4,5]

The 8-bit AES design is presented after 2005 for even lower area resource design. The reference [6] proposed an 8-bit Application-Specific-Instruction-Processor (ASIP) with 15 instructions for the programming of encipher, decipher and key expansion processes. The programs stored in a Block RAM

* * Corresponding author..

E-mail address: wilsonwi@hotmail.com.

1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2012.01.402

(BRAM) are executed in the processor which included control, data memory, and a multiplier-accumulator units to perform GF( 28 ) arithmetic operations. ASIP uses122 slices achieving 2.18 Mbps.

This paper presents another approach of the direct FPGA implementation of 8-bit AES using BRAMs to perform key expansion, shift row and byte substitution. The slice areas used are varied from 100 to 200 slices, depend on the detailed implementations from different design algorithms. It achieves much higher speed then ASIP due to the direct circuit implementation.

2. AES Algorithm

The AES algorithm is a round-based symmetric block cipher that processes the data block of 128 bits using a cipher key of 128, 192, or 256 bits. A sequence of four primitive functions, SubByte, ShifRow, MixColumn and AddRoundKey, form a loop called a round, to be executed Nr-1 time. The number of iteration loop Nr can be 10, 12, or 14 depending on the size of key. SubByte operation is a nonlinear byte substitution that operates independently on each byte of the state using a substitution table. ShiftRow operation is a circular shifting on the rows of the state with different numbers of bytes (offsets). MixColumn operation mixes the bytes in each column by the multiplication of the state with a fixed

•3 1 ^ 0 A

polynomial (3x +x +x +2 for encryption, bx +dx +9x+e for decryption) modulo x +1. AddRoundKey operation is an XOR process that adds a round key to the state at each iteration loop. [1]

Fig. 1 summarizes the AES algorithm in two flow diagrams, MixColumn is not performed at the last round, the sequences of SubByte and ShiftRow can be switched without affecting the final cipher output.

3. ECB mode

Two deep-color-dotted blocks, one at Encryption and the other at Decryption in Fig. 1, are defined as Cipher function CIPHk( ) and Decipher function DCIPk( ), respectively. Fig. 1 then can be simplified by using Cipher as well as Decipher shown in Fig. 2, and it is exactly the block diagram of ECB mode representation described by the NIST publications.[7]

ECB mode needs both cipher function CIPHk( ) and decipher function DECI k( ) to accomplish encryption and decryption. The 8-bit AES implementation is shown in Fig. 3, where 3 BRAMs are used for Key RAM, sbox ROM and shift RAM.

By proper applied three 4-bit counters C (count up 1) as input address ADDA, and C+5 (count up 5 for encryption), or C-3 (countdown 3 for decryption) as output address ADDB, the 32x8 Shift RAM can perform ShiftRow/InvShiftRow. Because the results of ShiftRow/InvShiftRow are identical to the count sequences of C+5/C-3 shown as follows;[8]

(C+5) sequence: (0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 15 ).

(C-3) sequence: (0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3 ).

In Fig.3 , the ShifttRAM output sout is serially shifted to ( S2 ,Sb S0). When Pld=' 1' in every 4 clocks, the consecutive 4 bytes ( S2, Si, S0, sout ) are parallel loaded to the 4 rotate-register ( Y3, Y2, Yi, Y0 ) as 4-byte input to the combinational logic block of MixColumn/InvMixColumn . The MixColumn outputs and KeyRAM outputs are XORed as textout ( InvMixColumn output imx goes to textout directly since they already XORed keys at sout ). Textout is also the input for the next round. A total of 11 rounds are cycled to obtain 176 bytes of textout.

130 slices are used to implement the circuit in Fig. 3 including 37 slices for KeyExpansion. If the circuit blocks of brown color are excluded, Fig. 3 performs encryption only and the circuit area reduces to 86 slices.

Text in

Round Key(0)

ShiftRow

SubByte

VïxCdum

Round Key(i)

ShiftRcw SubByte

B Round Key(Nr)

Cipher out (a) Encryption

Cipher in

Round Key(Nr)

vShiftRow nvSubByte

Key(i)

InvShiftRow InvSubByte

Round Key(0)

T©ct out (b) Decryption

aPh<( )

Cj j=1".Tl

Encryption

j=1".Tl

Figure 1: Flow diagram of AES Figure 2: ECB mode Figure 3: The Encryption/Decrypting of 8-bit AES

4. KeyExpansion

Fig.4 is an 128-bit key Expansion diagram, the 128 bit keyin is input to register (W0, W1, W2, W3) in parallel, where W3 goes though rotate, sbox and rcon addition, to obtain a new 32-bit number which serially XORs (K0,K1,K2,K3) to obtain the output (wk0,kw1,wk2,wk3), which also becomes the input to next round. The process repeats 10 rounds (1- clock per round) until 176 keys (11x16 bytes) obtained.

Fig.5 is the 8-bit version from Fig.4. In the first round (16-clock per round), when 128-bit keyin is storing in the 176x8 Key RAM in series, the 32-bit new number is generated through kin, rotate, sbox and rcon during the last 4 clocks, and is stored in (t3,t2,t1,t0). In next round, it serially XORs 4 bytes of kob (addressed by c-15 is equivalent to W0 in Fig.4), to obtain the first 4-byte keyout, which is equivalent to kw0 in Fig. 4, and is fed back again through kin and stored at (m3,m2,m1,m0) in series, which are equivalent to wk1, wk2 and wk3 in Fig. 4, and prepare to XOR kob for generating the rest 12-bte of keyout in Fig. 5. The process continues until the total 176 keys are expanded. 176 clocks are needed to finish the expansion in Fig. 5 instead of 11 clocks in Fig.4.

Key i n "(128-bit) -► K0

Sbox ROM

( LSB only )

Keyout Kyo f1

|c =0 to

C =14 to 1

Key i n Kld addra

sbr .C =14

c-15 . ^_

addrb koa lrconh®-

Figure 4: 128-bit key expansion

Figure 5: 8-bit key expansion

5. CFB and OFB

Cipher FeedBack (CFB) and Output FeedBack (OFB) modes are shown in Fig.6, where Pj, Cj and IV denote Plaint text, Ciphertext and Initial Vector, respectively. A block represents the 128-bit or 16-byte data and j is the block number. In Fig. 6, CFB obtains the next block input to CIPHk( ) from Cj while OFB obtains directly from the output of CIPHk( ).

It is noted that only Cipher function CIPHk( ) is required for CFB/OFB encryption and decryption, instead of both CIPHk() and DCIPk() are required in ECB. This is very favorable for hardware implementation in low area circuit design. The circuit implementation of CFB/OFB in Fig. 6 needs around 90 slices only.

6. Performance comparisons

Table 1 lists the performances of ASIP and the direct FPGA implementation of ECB, CFB and OFB, some differences are observed as follows:

• Slice number:

A fixed 122 slices are needed in ASIP regardless of circuit implementations. The slice number varied from 88 (CFB/ECB) to 134 (ECB) in direct implementation.

• Throughput: Direct implementation is over 14 times faster than ASIP.

• Software Supports: ASIP needs software programming and supports and no software involved in the

• Direct implementation.

The same circuit implemented in ISE 9.2 has higher slices, clock frequencies and throughputs than those of ISE 7.1, which is used for comparisons due to its closer to ISE 6.3 used in ASIP.

Table 1.

[6] ASIP ECB CFB/OFB

ISE 6.3 ISE 7.1 ISE 9.2 ISE 7.1 ISE 9.2

Clock frequency MHz 72 43 62 58 113

Slice 122 134 169 88 106

No. of BRAMs 2 3 3 3 3

Throughputs (Mbps) 2.18 31.27 45.09 42.18 82.18

Throughputs/area (Mbps/slice) 0.0178 0.2333 0.2668 0.4793 0.7752

Software supports yes no no no no

OFB/ECB is not only favorable for low-area circuit design; it is also favorable for the encryption output when the plaintexts are identical. Because the identical plaintexts encrypted under the same key may generate the same cipher text which might undesirable in some applications. It is not the case happen in CFB/OFB, because the identical plaintexts became almost random after adding the cipher output of IV. As shown in Fig. 7, the lower three blocks of plaintexts with (FFF.. .F) at the upper left box become three repeated (identical) random sequences in ECB after encryption while two completely different random numbers result in CFB or OFB.[9]

Pj IV Cj

1 r j > 101 CIPHk( ) ,r

3J 1

j=1...n j=1...n

Encryption Decryption

Pj IV Cj

in 0 1 j > 1

CIPHk( ) 1 , / S

j=1...n j=1...n

Encryption Decryption

plaintext

ECB encryption

Figure 6: (a) CFB mode, (b) OFB mode Figure 7: Comparing ECB encryption with CFB and OFB

7. Concluding Remarks

This paper presents the direct specific implementation of an 8-bit AES. The slice areas and throughputs vary from around 90 to 200 slices depend on the algorithms and the implementing techniques used. The 3- BRAM implementation achieves 88 slices in OFB/OFB and 130 slices in ECB which are better than or close to 122 slices achieved by ASIP, where the 15-instruction set, can implement different algorithms without effect the fixed slices number.

If the key expansion in our design can be processed by software and stored them in BRAM to be used later by the main AES processing, then the slices can be further decreased to around 50 slices.

8. Reference

[1] NIST. Announcing the advanced encryption standard(AES), FIPS 197. Technical report, National Institute of Standards and Technology, November 200 1.

[2] http://www.nist.gov/aes/, (Access Date: 20 January, 2011).

[3] X. Zhang and K. K. Parhi, "High-speed VLSI architectures for the AES algorithm," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 9, pp. 957-967, Sep. 2004.

[4] Pawel Chodowiec, Kris Gaj,"Very Compact FPGA Implementation of the AES Algorithm", CHES 2003.

[5] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, J.-D. Legat, "Compact and efficient encryption/decryption module for FPGA implementation of the AES very well suited for small embedded applications", Information Technology Coding and Computing, 2004. Proceedings. ITCC 2004, Volume 2, Page(s):583 - 587 Vol.2, 2004.

[6] Tim Good, Mohammed Benaissa, "Very small FPGA application-specific instruction processor for AES", IEEE Trans. Circuit and System,vol. 53, no. 7, 2006

[7] Morris Dworkin, "Recommendation for Block Cipher Modes of Operation" NIST Special Publication 800-38A 2001 Edition.

[8] Chi-Jeng Chang, Chi-Wu Huang , Hung-Yun Tai, Mao-Yuan Lin and Teng-Kuei Hue "8-bit AES FPGA Implementation using Block RAM" The 33rd Annual Conference of the IEEE Industrial Electronics Society (IECON) Nov. 5-8, 2007

[9] Chi-Wu Huang, Ying-Hao Tu, Hsing-Chang Yeh, Shih-Hao Liu, Chi-Jeng Chang, "Image Observation on the Modified ECB Operations in Advanced Encryption Standard", pp 264-269, i-Society International Conference, June 2011