AES and ARM processors

  • Danijela D. Protić The General Staff of the Serbian Army, Department of Telecommunication and Informatics (J-6), Center for Applied Mathematics and Electronics, Belgrade
Keywords: Processors, encryption, AES,

Abstract


The need for information security leads to big problems in the development of portable devices which have limited available memory space and power consumption. Also, if coprocessors for encryption are added to core processors, dimensions of such devices grow, they become inflexible and their price increases several times. It is, also, very well known that algorithms for data encryption are memory demanding and, because of a large number of operations that has to be executed during encryption and decryption, coprocessors often slow core processors. For one of cryptography standards, AES, the NIST has accepted Rijndael’s block algorithm; the length of the input and output data stream is 128 bits, and the lengths of the cipher key can be 128, 192 and 256 bits. Due to the characteristics of low power consumption, a 32-bit architecture format, as well as fast execution of instructions, ARM processors implement the AES algorithm and do not burden the main processes in the system in which it is implemented. The ARM technology is protected as intellectual property, not as a processor design. As a result, many manufacturers have developed their own ARM-based products, so today over 2 billion chips are produced. This paper presents the possibilities for improving the performance of the AES algorithm using the latest version of the ARM processor.

Introduction

The growing need for information security requires fast execution of cryptographic algorithms. With the increasing usage of wireless and portable devices in everyday life, cryptographic protection becomes crucial. This task becomes extremely challenging when using portable devices that have limited power, memory and power consumption. In the traditional approach, if the demanding cryptographic coprocessors are installed, the core processor becomes inflexible. The alternative is the integration of new simple instructions, to accelerate the processor and support the cryptographic operations.

In the nineties, the Acorn company developed the first high-performance and low-power microprocessor for commercial purpose – ARM. Unlike other manufacturers, the ARM Holdings did not register the ARM processor as a product but as intellectual property, which was probably the reason why many other companies, such as Intel, Motorola and Texas Instruments, developed their own products, with a total of about 2 billion chips produced. ARM processors are suitable for use in portable battery-powered systems such as smart cards, consumer electronics, etc. The ARM architecture is designed to support 32-bit embedded systems that can be used in a variety of devices, from small sensors on the assembly line to the NASA control systems. The elements of ARM-based devices are the ARM processor, controllers, peripherals and busses.

In 2001, the US National Institute of Standards and Technology (NIST) released the results of the competition for a new algorithm for data protection, which had to replace the Data Encryption Standard (DES) with the new one. The NIST insisted that symmetrical block algorithms had to be submitted to the tender. Rijndeal’s solution was chosen as the best algorithm out  of 15, due to a good combination of safety, efficiency, easiness of implementation and flexibility. In 2001, the NIST standardized the Advanced Encryption Standard (AES) in the category of standards for computer security. The standard came into effect in May 2002.

ARM processors speed up the AES algorithm execution and release main processors of the cryptographic processing. The advantages of the ARM processors implementation are presented in this paper.

AES

AES is Rijndael’s standardized block iterative algorithm for data encryption whose input and output sequences lengths are 128 bits. The cipher key has a variable length of 128, 192 or 256 bits. Encryption is performed in four steps, meaning four byte-oriented transformations: substitution using substitution tables, shifting state matrix (s) for different offsets, transformation of one column of the state matrix and round key adding. Each transformation has its inverse transformation for decryption. Deciphering steps run in the opposite direction of the steps in encoding. Encryption begins with a state matrix transformation. At first, the plain text (input data that is not encrypted) is assigned to the state matrix, which is organized as a matrix of 4x4 bytes. The number of rounds (Nr) determines the key length K (32-bit words). Decryption goes in the opposite way – from the ciphered data to the plain text.

ARM processors

The choice of a suitable processor that can be integrated in the embedded system depends on the application domain. Applications can be arithmetic or control intensive, which can be a problem for devices for mass and large scale usage. With that in mind, it is necessary to use different classes of processors such as microcontrollers, digital signal processors (DSPs), application specific processors, multimedia processors and RISC processors. ARM processors are developed based on RISC processors; however, what distinguishes them is their purpose – ARMs are to be a part of a larger, embedded system. In this system, the focus is not only on processor’s power but also on its efficiency. It is essential to achieve the maximum system performance and take care of low power consumption.

The ARM instructions are 32-bit and run in three to five steps: receiving, decoding, execution, memory access and data storage. They require a load-store architecture, three-address command for data processing, conditional execution of commands, powerful multiple registers for load-store operations in one clock, performing shift and arithmetic and logic operation in one clock cycle. The expansion to the open set of commands via the coprocessor instruction set is possible with 16-bit Thumb instructions.

The ARM Holdings have designed a number of processors, but this paper discusses ARM7 and ARM9 processors from the ARMv5EJ series, whose improvements were JavaScript acceleration, better multiprocessor instructions and new multimedia instructions.

Improvements in the performances of the AES algorithm by the ARM processors usage

Software implementation of cryptographic parameters generally offers the highest degree of flexibility, but it can lead to poor performance of the embedded system in terms of processing power, memory and energy. A direct way of overcoming this problem is the integration of coprocessors that “liberates” the main processor of the cryptographic processing. Depending on the applications, the coprocessor can reduce the memory required for the execution of cryptographic algorithms and additional instructions can reduce the total processing time and energy. Cryptographic hardware is typically much faster and more energy efficient than the software installed in the main processor. However, how to choose the right combination of hardware and software design is a problem that requires cooperation of engineering of different profiles in order to reach the required solution and fulfill the market demands.

ARM processors can improve the characteristics of the AES algorithm at both hardware and software levels in various ways, such as:

- architecture (e.g. C and the assembler for 64-bit processors have better performances than the Java code for 32-bit processors),

- architecture expansion (e.g. less memory addressing, a smaller number of instructions),

- optimization algorithm for key expansion, encoding and decoding (e.g. transposition of the state matrix),

- improvement of old and adding new instructions for optimization (e.g. reducing code density),

increasing the speed of algorithm execution (e.g. Intel added a new instruction, Intel AES-NI),

- reduction of energy consumption (e.g. using 32-bit low-power ARM processors), etc.

Conclusion

For devices that have limited processing power, memory and energy consumption, which are the characteristics of all portable modules that require high-speed data processing and have a battery, there is a need for chips suitable to meet these conditions and, at the same time, to provide them with high operational quality. ARM processors are almost ideal for the realization of these devices and considering that the ARM is protected as intellectual property, not as a design, a large number of companies uses this technology for smart cards, consumer electronics, sensors in production systems, and the like. However, the ARM is not only designed to reduce energy consumption or dimension of devices, but its characteristics contribute to the efficiency, particularly because of the fact that it can be a part of the embedded system. The ARM 32-bit instructions perform in the 3-5 steps; they are used for the processing, storage and transmission of data.

Due to its characteristics, the ARM is often used to speed up the cryptographic algorithms. One of them is NIST’s AES, a standardized Rijndael’s solution in the category of standards for information security; lengths of input and output data are 128-bits and the key lengths are 128, 192, and 256 bits. The AES can be implemented in software, it is flexible, and has a good combination of security and efficiency. The ARM provides better execution of the AES by software and hardware optimization at the level of applied software, by architecture expansion, by the optimization of algorithms for key expansion, encryption and decryption, by adding new instructions for the AES optimization and enhancing the existing ones,, by increasing the computational power and by reducing energy consumption.

 

References

Andrews, J., R., 2005, Co-Verification of Hardware and Software for ARM SoC Design, Elsevier.

Atasu, K., Belveglieri, L., Macchetti, M., 2004, Efficient AES Implementations for ARM Based Platforms, SAC’04, Nicosia, Cyprus, March 14-17.

Bertoni, G., Breveglieri, L., Farina, R., Regazzoni, F., 2006, Speeding Up AES By Extending a 32-Bit Processor Instruction Set, pp.275-282, In Proceedings of the 17th IEEE International Conference on Application-Specific Systems, Architectures and Processors ASAP 2006, Sep 11-13.

Daemen, J., Rijmen, V., 1999, Efficient Block Ciphers for Smartcards, pp.29-36, In USENIX Workshop on Smartcard Technology Smartcard ’99, May 10-11.

Daemen, J., Rijmen, V., 2000, The Block Cipher Rijndael, pp.288-296, In Quisquater, J. and Schneier, B., editors, Smart Card Research and Applications, Volume 1820 of Lecture Notes in Computer Science, Springer, Berlin.

Daemen, J., Rijmen, V., 2001, Rijndael, the Advanced Encryption Standard, Dr. Dobb’s Journal, 26(3), pp.137–139.

Dandalis, A., Prasanna, V. K., Rolim J. D. P., 2000, A Comparative Study of Performance of AES Final Candidates Using FPGAs, pp.125-140, In C¸. K. Ko¸c and C. Paar, editors, Proc. Cryptographic Hardware and Embedded Systems Workshop (CHES’00), Volume 1965 of LNCS, Springer-Verlag.

Darnall, M., Kuhlman, D., 2006, AES Software Implementation on ARM7TDMI, pp.424-435, In Barua, R., and Lange, T. (Eds.) Progress in Cryptology – INDOCRYPT 2006, 7th International Conference on Cryptology in India, Kolkata, India, December 11-13.

Irwin, J., Page. D., 2003, Using Media Processors for Low-Memory AES Implementation, pp.144–154, In Proceedings of the 14th IEEE International Conference on Application-specific Systems, Architectures and Processors ASAP 2003, June 24-26.

Jovanović, B., 2010, Algoritmi selektivnog šifrovanja – pregled sa ocenom performansi, Vojnotehnički glasnik/Military Technical Courier, 10(4), pp.134-154.

Klami, K., Hammond, B., Spencer, M., 2009, ARM Announces 10 Billionth Mobile Processor, Dostupno na: http://www.arm.com/news/24403.html, Preuzeto 10.01.2013.

Kuljanski, S., 2010, RSA algoritam i njegova praktična primena, Vojnotehnički glasnik/Military Technical Courier, 10(3), pp.65-77.

Osvik, D. A., Bos, J. W., Stefan, D., Canright, D., 2010, Fast Software AES Encryption, pp.1-20.

Sano, F., Koike, M., Kawamura, S., Shiba, M., 2002, Performance Evaluation of AES Finalists on the High-End SMART Card, pp.82-93, Third AES Candidate Conference, New York, USA, April 13-14.

Sloss, A. N., Symes, D., Wright, C., 2004, ARM System Developer’s Guide, Designing and Optimizing Software, Morgan Kaufmann Publishers (Imprint of Elsevier).

Sternbenz, A., Lipp, P., 2002, Performance of the AES Candidate Algorithm in JAVA, pp.161-165, Third AES Candidate Conference, New York, USA, April 13-14.

Tillich, S., Großschadl, J., 2006, Instruction Set Extensions for Efficient AES, pp.270-284, In Goubin, L. and Matsui, M. (Eds.): CHES 2006, LNCS 4249, Implementation on 32-bit Processors, International Association for Cryptologic Research.

Federal Information Processing Standards Publication 197, 2001, Announcing the Advanced Encryption Standard (AES), Dostupno na: http://csrc.nist.gov/publications/, Preuzeto 12.10.2012.

Intel, Intel Advanced Encryption Standard (AES) Instruction Set – Rev 3.01, [internet], Dostupno na: < http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set i>, Preuzeto 08.01.2013.

Intel, 1998, Intel StrongARM SA-110 Microprocessor Instruction Timing, Application Note 278194-001, Intel Corporation.

Intel, 2000, Intel StrongARM SA-1110 Microprocessor, Developer’s Manual 278240-003, Intel Corporation.

Q4 revenue came from the sale of 1.8 billion ARM-processor based chips, 2001, ARM press release.

Published
2013/12/06
Section
Professional Papers