Implementation of cryptology algorithms in high bit rate systems

  • Nikola M. Jaćimović Centre of applied mathematics and electronics, Belgrade
  • Bratislav Ž. Planić Centre of applied mathematics and electronics, Belgrade
Keywords: AES, Design optimization, VHDL, FPGA, encryption,

Abstract


This paper analyzes the implementation of cryptology algorithms in order to obtain the best possible performance in terms of speed, thus providing support for protected communication between two participants with the smallest possible impact on the performance of the given network infrastructure. The paper explains the difference between the hardware and software implementation of encryption algorithms. It shows the main characteristics of FPGA chips and the advanced technics of the VHDL design language that were used for the implementation of crypto algorithms. The AES-256 encryption algorithm is  selected for the implementation  since it has proven to be good both in hardware and in software versions. The development environment used is the Xilinx ISE Design Suite as well as the development boards Xilinx Spartan SP-605th and Xilinx Kintex KC-705th. All the results correspond to the devices that contain Spartan®-6 and Kintex®-7 chips.

Introduction

At the present time, there is an increasing need for faster exchange of information which must be protected during transmission. However, the "bottleneck" of the entire system is the process of encryption and decryption. Therefore, in search of optimal solutions,there are three main issues: which method of implementation, hardware or software, will give better performance, which technologies are to be used, and which way of the crypto algorithm coding to apply.

Hardware and software cryptosystems

Despite various advantages offered by the software implementation of crypto algorithms, NSA, for example, authorizes only hardware encryption for several reasons such as speed, security, and ease of installation (Schneier, 1996 p.192-194). Thus, in high bit rate systems, hardware implementation will be used, and, as one of the most rewarding technologies of such implementation, the application of programmable chips imposes itself.

FPGA logic circuits

The structure of FPGA (Field Programmable Gate Array) logic gates is made of a large number of identical logic cells subsequently linked to achieve a desired function. The design process is significantly accelerated by using these circuits, so that the implementation can quickly hit the market, and the price of such a development design is lower than other ways of programming hardware. One of the hardware description languages (VHDL, Verilog,etc.) is used for designing logic. Regardless of the language used, the performance of the final design will be most affected by how we use different optimization techniques when writing the code.

Application of FPGA chips in cryptography

Due to commercial needs reflected in the reduction of the costs of designing circuits, and in terms of risk reduction and faster appearance on the market, the use of FPGA programmable circuits has increased, so that now these chips are not only used in the prototype device (in order to be replaced by ASIC chips during production), but  also in mainstream production. In crypto-systems, the advantages of such chips are multiple (Wollinger, et al., 2004): algorithm agility, algorithm upload, algorithm modification, throughput and cost efficiency.

VHDL optimization

There are three basic definitions of speed, depending on the context of the problem: throughput (amount of data that can be processed in one cycle), latency (the time that elapses from the receipt of the input signal to obtain the corresponding outputs) and timing (logic retention between sequential elements) (Kilts, 2007).

In order to get a higher signal bandwidth, the concept of pipelines is introduced. This concept involves dividing some of more complex operations to more simpler ones. The idea is to perform the processing of small amounts of data in a shorter period of time. This increases the maximum clock frequency, reduces the synthesis time, and increases the throughput of the system.In order to achieve the design with as little latency as possible, the data flow from the input to the output is necessary to take place as quickly as possible. This is achieved by optimizing the time required for the processing of intermediate results. Clock speed can be improved by using different methods such as adding registers, parallel structures, flatten logical structure, registers balancing and the redistribution of the data flow.

The basic concept of the AES algorithm

The AES (Advanced Encryption Standard), as one of the most popular cipher algorithms today, is well demonstrated both in software and in hardware variants. In this paper we used the AES-256.

A review of past solutions

Today, there are a number of papers dealing with the problem of the implementation of this crypto algorithm into the FPGA or ASIC chips. The authors, in addition to the basic implementation, mainly deal with two types of optimization - optimization area and time optimization.

Own solution

For an effective implementation of the AES algorithm, all optimization techniques of VHDL designs must be taken into account and, as a starting assumption, a possibility of achieving the same and/or better results than those currently available.In this paper, the ECB mode and the AES-256 were used for better observing the results that may be obtained by optimizing the design. The development of the algorithm was executed in the programming environment Xilinx ISE 14.2. The time parameters used in this paper are the parameters obtained from the development environment.

Structure of the design

The algorithm is divided into three parts – encryption block, decryption block and key-expansion block. Since the key-expansion is executed once, at the beginning of the algorithm, and since the resulting round keys do not change during the operation, these subkeys are stored in the internal ROM.

Encryption block

The main purpose of this block is the encryption of the input signals and sending the resulting ciphertext to the output. It is important to note that the ciphertext block on the output is valid for only one clock, which should be enough for its referral to the line.

Key-expansion block

This part of the algorithm is designed so that the matrix of keys keeps all subkeys of rounds. The waiting time for a valid subkey is thus reduced to a minimum.

Speed of the VHDL implementation of the AES

This paper presents a timing analysis of each block separately as well as their optimization.

Time analysis of the key-expansion block

The first version of the key_expander is designed using state machines. The first, unoptimized, design version involves the state machine shown in Figure 5. The maximum frequency of this design is 102.558MHz for the Spartan-6, and 226.827MHz for the Kintex-7 chip. It is easy to observe that the "bottleneck" of such a system is the state key_exp_state. By optimizing it and breaking it into two substates (key_exp_state1 and key_exp_state2), each of which generates exactly one subkey,a significant improvement is achieved in terms of speed and the maximum frequency which is 154.616MHz (Figure 6) (306.607MHz for the Kintex-7). The best results are obtained by the so-called sequential coding – the received maximum frequency is 269.920MHz for the Spartan-6 and 336.655MHz for the Kintex-7.

Time analysis of the encryption block

The first version of the code, the state machine of which is shown in Figure 7, is able to successfully execute at a frequency of 212.326MHz (in the case of the Spartan-6), i.e. the maximum speed between the two crypto devices can be up to 27,2Gbps. In the case of the Kintex-7, the maximum frequency is 274.907MHz (or 35,2Gbps). As the most complex combinational logic is contained in the main round, the introduction of pipeline optimization techniques divides each state of the main round into two substates. The state machine of this design is shown in Figure 8. After the optimization, the increase in speed is about 16% (Spartan-6) or 23% (Kintex-7), which is a theoretically gained support to telecommunication systems up to 31,6Gbps or 43,3Gbps, respectively. The best result was obtained again with the sequential technique of describing the logic and for the Spartan-6 it is 273.459MHz, and data encryption can be done at a speed of 35Gbps (about 4,4Gb/s). In the case of the Kintex-7, those results are 375.340MHz or 48Gbps (or 6GB/s).

Conclusion

This paper considers the implementation of the algorithm on FPGA chips. Based on these results, we can notice the importance that a way of using VHDL has on the performance. To obtain a design that has to support the operation at higher frequencies, the pipeline imposes itself as an indispensable concept in the process of code writing.

 

Published
2015/07/27
Section
Original Scientific Papers