GSM OMC <Name>

A P-box simply reorders the bits to different locations; this is equivalent to wire crossing. Later in DES you will see Expansion and Compression P-Boxes.

Figure 1 P-Box

S-boxes

An AxB bit S-box replaces an A bit input with a B bit output.

Figure 2 S-Box

Substitution-Permutation Network

Also called a mixing transformation A Substitution-Permutation Network has sets of S-boxes linked by P-boxes.

Figure 3 Substitution-Permutation Networks, with the Avalanche Characteristic

Algorithms

Symmetric Algorithms

There are basically two types of key-based algorithms: symmetric and public-key. In symmetric algorithms the encryption key can be calculated from the decryption key easily (e.g. the reverse of the encryption key). So essentially with Symmetric Algorithms as long as the encrypted data is to remain a secret so must the key(s). There are two types of symmetric algorithm: Stream ciphers and Block ciphers. Both algorithms are used in GSM are symmetric ciphers, A5 is a symmetric stream cipher and A3 and A8 are example of a symmetric block cipher.

Block Ciphers

Block Ciphers operate on plaintext in groups of bits (blocks). A typical block size is 64 bits. With a block Cipher, the same plaintext block will always encrypt to the same ciphertext block, using the same key. DES is an example of a block cipher. DES encrypts 64-bit blocks of plaintext and products 64-bit of ciphertext. The key length is 56 bits the key can be any 56–bit number and can change at any time, all security rests in the key. DES applies a substitution followed by a permutation (this is known as a round) to each block 16 times.

Figure 4 Block cipher (‘M’= plaintext ‘C’ = ciphertext)

Public-Key Algorithms

Public Key encryption is not use in the GSM system, but for completeness I will briefly discuss it. Public-Key algorithms or asymmetric algorithms use two separate keys; a public key and a private key. The private key cannot be generated from the public key. The public key as its name suggests is available to everyone and is usually stored in a key database. It is used for encryption so anyone who wants to send you a message can download your public key, encrypt the message and send it to you. On receipt of the ciphered message you can decrypt it by use of your private key. All Public-Key algorithms are slower (approximately 1000 times slower_[1]) then there symmetric counter parts.

Stream Ciphers

Stream Ciphers operate on streams of plaintext and ciphertext one bit or byte at a time. Using a stream cipher, the same plaintext bit or byte will encrypt to a different bit or byte every time it is encrypted. A stream cipher uses a Keystream Generators in conjunction with the plain text. The Keystream Generator produces a "Random" string of bits and these are used to generate a bit with each bit of the message, for example each bit of the message could be XORed with a corresponding bit from Keystream generator.

The strength of this system lies totally in the "randomness" of the Keystream that must be generated at both the Encrypting and Decrypting end of the communication channel. If it is totally random then the Cipher is as perfectly secure as a one-time pad _[1]. If it is for example a repeating set of 8 digits it will be easily broken. It is vital that stream ciphers have session keys. The output of the Keystream generator is a function of this key. This means that if the Keystream for one communication is compromised, all other communications are still secure. Stream ciphers may be used to encrypt streams of communications traffic.

Figure 5 Stream cipher

Cryptanalysis

The objective of cryptanalysis is to "break the code", i.e. to recover the plaintext from the ciphertext, and preferably also to find which key was used.

Kerckhoff's Principle.

Kerckhoff's Principle_[16] is the general assumption that an opponent, "Oscar", knows which cryptosystem is being used. This is a reasonable assumption for the security reason. There are different levels of attacks on a cryptosystem.

Ciphertext-only attack: The assumption is the opponent possesses a string of ciphertexts, y.

Known plaintext. The opponent possesses a string of plaintexts, x, and the corresponding string of ciphertexts, y.

Chosen plaintext. The opponent has a temporary access to the encryption machinery, so that he can choose a plaintext string x and construct the corresponding ciphertext string y.

Chosen ciphertext. The opponent has a temporary access to the decryption machinery, so that he can choose a ciphertext string y and construct the corresponding plaintext string y.

Work Factor

Work Factor measures what is needed to carry out a specific analysis or attack against a cryptographic Algorithm. The attack is conducted under a given set of assumptions, which include the information available to achieve a predetermined goal, such as the recovery of the plaintext or key _[14]. Good Algorithms maximize the work need to compromise them.

In practice, there is no universally accepted, fixed set of parameters used to express the work factor. However, whatever it is measured in e.g. Hours, Number of Mathematical operations is usually converted into US Dollars.

Differential Cryptanalysis

Differential cryptanalysis is a type of attack that can be mounted on iterative block ciphers. S. Murphy first introduced this technique in an attack on FEAL-4 (Fast Data Encipherment Algorithm, 4 for rounds)_[1] but this method was later improved and perfected by Biham and Shamir who used Differential Cryptanalysis to attack DES _[30]. Differential cryptanalysis looks specifically at ciphertext pairs: pairs of ciphertext whose plaintexts have particular differences, but are encrypted under the same key. By careful analysis of the available data, probabilities can be assigned to each of the possible keys and eventually the most probable key is identified as the correct one.

Differential cryptanalysis has been used against a great many ciphers with varying degrees of success. In attacks against DES, its effectiveness is limited by carefully engineered S-boxes designed for DES in the mid-1970s. Differential cryptanalysis has also been useful in attacking other cryptographic algorithms such as hash functions.

No. of Rounds

Chosen Plaintext

Known Plaintext

Analyzed Plaintext

Complexity of Analysis

2¹⁴

2²⁴

2³¹

2³⁹

2⁴⁷

2³⁸

2⁴⁴

2⁴³

2⁴⁷

2⁵²

2⁵¹

2⁵⁶

2⁵⁵

2¹⁴

2²¹

2²⁹

2⁷

2³⁶

2⁹

2³²

2¹⁵

2³²

2²¹

2³²

2²⁹

2³⁷

Figure 6 Differential Cryptanalysis Attacks against DES _[1]

Linear cryptanalysis

Matsui and Yamagishi first devised linear cryptanalysis also on an attack on FEAL _[32]. It was extended by Matsui to attack DES _[33]. Linear cryptanalysis is a known plaintext attack and uses a linear approximation to describe the behavior of the block cipher. Given sufficient pairs of plaintext and corresponding ciphertext, bits of information about the key can be obtained and increased amounts of data will usually give a higher probability of success.

There have been a variety of enhancements and improvements to the basic attack. Langford and Hellman introduced an attack called "differential-linear cryptanalysis", which combines elements of differential cryptanalysis with those of linear cryptanalysis _[34]. Also, Kaliski and Robshaw showed that a linear cryptanalytic attack using multiple approximations might allow for a reduction in the amount of data required for a successful attack _[35].

Weak keys

Weak keys are secret keys with a certain value for which the block cipher in question will exhibit certain regularities in encryption or a poor level of encryption. For instance, with DES, there are four keys for which encryption is exactly the same as decryption _[1]. This means that if one were to encrypt twice with one of these weak keys, then the original plaintext would be recovered. For IDEA (International Data Encryption Algorithm), there is a class of keys for which cryptanalysis is greatly facilitated and the key can be recovered. However, in both these cases, the number of weak keys is such a small fraction of all possible keys that the chance of picking one at random is exceptionally slight. In such cases, they pose no significant threat to the security of the block cipher when used for encryption.

Of course, for other block ciphers, there might well be a large set of weak keys for which the chance of picking a weak key is more probable. In such a case, the presence of weak keys would greatly weaken the security of the cipher_[40].

Weak key Value (with parity bits)

Actual Key

0101 0101 0101 0101

1F1F 1F1F 0E0E 0E0E

E0E0 E0E0 F1F1 F1F1

FEFE FEFE FEFE FEFE

0000000 0000000

0000000 FFFFFFF

FFFFFFF 0000000

FFFFFFF FFFFFFF

Figure 7 DES Weak Keys_[1]

Data Encryption Standard (DES)

Description of DES

DES has been a worldwide standard for 25 years. DES is an NSA (National Security Agency)-evaluated algorithm that was made public. Although it is showing signs of old age it is still reasonably secure. DES has been crypto analysed since its release so it is regarded as an excellent starting place when studying cryptology. DES is a symmetric cipher, specifically a 16-round Feistel cipher (see below) and was originally designed for implementation in hardware _[1]. It is a block cipher that operates on 64-bit blocks. Its key length is 56-bits as its 64-bit key is reduced to 56-bits as every 8^th bit is used for parity. It structure is as follows.

_{Initial Permutation

This Permutation splits the 64 bit block into two 32-bit Blocks, it takes the 58^th bit and puts it into the first left space, then the 50^th bit and puts it in the second and then it takes the 42^nd bit…and so on.
Left block; 58,50,42,34,26,18,10,2,60,52…..8^th bit
Right Block; 57,49,41,33,25,17,9,1,59,51……7^th bit
This separation is to facilitate the Feistel structure of the cipher. In the classic Feistel structure, half of the data block is used to modify the other half of the data block, and then the halves are swapped.

Figure 8 A round of a Feistal Cipher

The Key Transformation

Each round has an individual key of 48 bits. The individual keys are generated by dividing the 56-bit key it into two 28-bit parts. Then the bits are circularly shifted left one or two bits depending on the round. Then 48 out of the 56 bits are selected by a compression permutation _[1]. As the key is shifted before each round, the 8 bits that are ignored are different for each round.
Rounds 1,2,9,16 are one shift and all the others are shifted left twice.

Round No.

Number of circular shifts

1,2,9,16

One

3,4,5,6,7,8,10,11,12,13,14,15

Two

Figure 9 Number of circular shifts for each round
The Expansion Permutation

This operation expands the right half of the plaintext from 32-bits to 48-bits so that it can be XORed with the round key. The 32 bits are split into 8 groups of four bits, where the first and last bit of the group of four appear twice in the new 48 bit number and the two middle bits only appear once. For example the 3^rd bit of the 32-bit number would become the 4^th bit, but the 4^th would become both the 5^th and 7^th bit of the new 8 – bit number.}

Figure 10 The Expansion Permutation Box in DES

Figure 11 The structure of DES _[1]

S-Box Substitution

After the round key is XORed with the expand plaintext block, the 48-bit result is divide into 8 6-bit sub blocks. Each block is operated on by a different s-block. The six-bit number that is inputted is used to in a look-up table fashion to get a four-bit number.

The first and last bit gives the row x number and the other four bits give the column number y. What every 4-bit number is at row x and column y is output as part of the 32 bit number.

P-box Permutation

A straight permutation is preformed. Each of the 32 bits is just reordered in a new location. For example bit 16 becomes bit 1 and bit 7 becomes bit 2 (see _[1]For full listing). Finally the result of the P-box permutation is XORed with the left half of the initial 64-bit block as per the Fiestal filter structure. Then the left and right halves are switched and another round begins.

Figure 12 The structure of a round in DES _[1]

Final Permutation.

The inverse of the initial permutation just reorders all the bits to the original places.

Decrypting DES

With DES it is possible to use the same function to encrypt or decrypt a block. The only difference is that the keys must be used in the reverse order.

Attacks on DES

No easy attack on DES has been discovered, despite the efforts of researchers for many years. The obvious method of attack is a brute-force exhaustive search of the key space; this process takes 2⁵⁵steps on average. In 1980 Hellman_[37] showed a time-memory trade-off that allows improvement over exhaustive search if memory space is plentiful. There were also accusations the NSA had intentionally weakened DES. In 1998 Wiener estimated that a DES cracker could be built for one million dollars to give an average time of 35 minutes to break the algorithm _[38].

The first attack on DES that was computational better than an exhaustive search was announced by Biham and Shamir using differential cryptanalysis _[39]. This attack requires the encryption of 2⁴⁷ chosen plaintexts. Although it was a breakthrough, this attack is not practical because of both the large data requirements and the difficulty of mounting a chosen plaintext attack. Biham and Shamir have stated they consider DES secure.

More recently Matsui _[41] has developed another attack, known as linear cryptanalysis (as mentioned in the previous chapter). By means of this method, a DES key can be recovered by the analysis of 2⁴³ known plaintexts. The first experimental cryptanalysis of DES, based on Matsui's discovery, was successfully achieved in an attack requiring 50 days on 12 HP 9735 workstations.

On Tuesday, January 19, 1999, Distributed Net, a worldwide coalition of computer enthusiasts, worked with EFF's (The Electronic Frontier Foundation) DES Cracker and a worldwide network of nearly 100,000 PCs on the Internet, to win RSA Data Security's DES Challenge III in a record-breaking 22 hours and 15 minutes. EFF DES Cracker, which was built for less than $250,000 tested 245 billion keys per second in conjunction with the computer network _[42].

Figure 13 Paul Kocher, The DES Cracker and a "Deep Crack" chip

Paul Kocher, the machine's principal designer, displays one of the boards holding 64 custom search microchips. The six cabinets (behind) house 29 boards whose searching is coordinated by a PC (left). The machine, tests over 90 billion keys per second, taking an average of less than 5 days to discover a DES key. On the right one of over 1800 custom microchips, perform the DES key search. Each chip contains 24 search units that test possible keys against dual ciphertexts _[42].

The consensus of the cryptographic community is that DES is not secure, simply because 56 bit keys are vulnerable to exhaustive search and not that there is a weakness within the algorithm. In fact, DES is no longer the standard for U.S. government use; triple-DES is in use until AES (see Section 3) is ready for general use _[36].

The Advanced Encryption Standard (AES)

Introduction

The Advanced Encryption Standard (AES) will be a new Federal Information Processing Standard (FIPS) Publication that will specify a cryptographic algorithm for use by U.S. Government organizations to protect sensitive (unclassified) information.

The AES is being developed to replace DES, but the NIST (National Institute of Standards and Technology) anticipates that Triple DES will remain the approved algorithm for the foreseeable future. Rijndael has been chosen as the AES algorithm. Rijndael's combination of security, performance, efficiency, ease of implementation and flexibility make it an appropriate selection for the AES _[26].

Specifically, Rijndael appears to be consistently a very good performer in both hardware and software across a wide range of computing environments regardless of its use in feedback or non-feedback modes. Its keysetup time is excellent, and its key agility is good. Rijndael's very low memory requirements make it very well suited for restricted-space environments, in which it also demonstrates excellent performance. Rijndael's operations are among the easiest to defend against power and timing attacks _[26].

The following Chapter will detail both Rijndael and the four 2^nd Round finalists. I will concentrate on Rijndael but also give a brief description of the other four algorithms.

The final section of this chapter is a short break down why Rijndael over the other finalists.

Round 2 finalists

No.	Name	Designers
1.	Rijndael	Joan Daemen, Vincent Rijmen
2.	Mars	IBM
3.	RC6^TM	RSA Laboratories
4.	Serpent	Ross Anderson, Eli Biham, Lars Knudsen
5.	Twofish	Bruce Schneier, John Kelsey, Doug Whiting, David Wagner, Chris Hall, Niels Ferguson

Figure 14 AES Round 2 Finalists

Below is a summary of each of the finalist in alphabetical order;

MARS

MARS is the IBM's entrant into the AES competition. As IBM is the company that designed DES, this alone makes this entry of particular interest.

The overall structure of MARS is as follows;

First, key material is XORed with the whole block (pre-whitening see glossary). Then, eight rounds of a transformation similar to DES are applied, but that transformation is fixed, and without any part that is affected by a key. These eight rounds of unkeyed transformation seem to be criticised by many sources _{[21 & 27]}, believing that they are wasteful and should not be used so early in the algorithm.

Then, there are sixteen rounds, which constitute the "cryptographic core" of the cipher. One 32-bit word is used to modify the other three, by being split into three copies of itself, and subjected to various manipulations, including one multiplication by key material, one S-box lookup, and two data-dependent rotations.

Then we have another unkeyed transformation, and another XOR of key material.

There are 40 32-bit words of subkeys, which are generated by a kind of shift-register method from the key.

The unkeyed rounds use two 8x32 bit S-boxes, addition, and the XOR operation. In addition to those elements, the keyed rounds use 32-bit key multiplication, data-dependent rotations, and key addition. Both the mixing and the core rounds are modified Feistel rounds in which one fourth of the data block is used to alter the other three fourths of the data block _[22].

Figure 15 The structure of Mars

RC6

Devised by Dr. Ronald C. Rivest, RC6 is based on Feistel rounds; but not Feistel rounds operating between the two halves of the block. Instead, the Feistel rounds operate between pairs of quarters of the block, and they are interlocked by the exchange of some data. 20 rounds were specified for the AES submission.

Figure 16 Summary of Encryption with RC6

Figure 17 - Summary of Decryption with RC6

The round function of RC6 uses variable rotations that are regulated by a quadratic function of the data. Each round also includes 32-bit modular multiplication, addition, XOR, and key addition. Key addition is also used for pre- and post-whitening _[24]. The design of the block cipher is such that the number of rounds, the size of the key, and the size of the block, are all flexible _[27].

RC6 uses 44 subkeys, numbered S0 to S43, each one 32 bits long. The block to be enciphered is divided into four 32-bit integers, A, B, C, and D. The first four bytes enciphered form A, and the convention is little-endian; the first byte enciphered becomes the least significant byte of A. Each round of RC6 uses two subkeys; the first one uses S2 and S3, and successive rounds use successive subkeys _[24].

Figure 18 The structure of RC6

Rijndael

Rijndael is a substitution-linear transformation network designed to use only simple whole-byte operations with 10, 12 or 14 rounds, depending on the key size_[17]. It provides extra flexibility over that required of an AES candidate, in that both the key size and the block size may be chosen to be any of 128, 192, or 256 bits. However, the variations of Rijndael which act on larger block sizes apparently will not be included in the actual standard _[27].

Rijndael works as follows; A data block to be processed using Rijndael is partitioned into an array of bytes, and each of the cipher operations is byte-oriented. Rijndael’s round function consists of four layers. In the first layer, an 8x8 S-box is applied to each byte. The second and third layers are linear mixing layers, in which the rows of the array are shifted, and the columns are mixed. In the fourth layer, subkey bytes are XORed into each byte of the array. In the last round, the column mixing is omitted _[27].

The Rounds in detail

Each regular round involves four steps.

Figure 19 Step 1 of Round Byte-Sub _[17]

First is the Byte Sub step, where each byte of the block is replaced by its substitute in an S-box.

Figure 20 Step 2 of Round Mix Column _[17]

Next comes the Mix Column step. Matrix multiplication is performed: each column, in the arrangement we have seen above, is multiplied by the following matrix:

2	3	1	1
1	2	3	1
1	1	2	3
3	1	1	2

Figure 21 The Multiplication matrix

This multiplication is done over GF(2^8). This means that the bytes being multiplied are treated as polynomials rather than numbers. Thus, a byte "multiplied" by 3 is that byte XORed with that byte shifted one bit left. If the result has more than 8 bits, the extra bits are not simply discarded: instead, they're cancelled out by XORing the binary 9-bit string 100011011 with the result. This string stands for the generating polynomial of the particular version of GF(2^8) used; a similar technique is used in cyclic redundancy checks.

Figure 22 Step 3 of Round Shift Row _[17]

Considering the block to be made up of bytes 1 to 16, these bytes are arranged in a rectangle, and shifted as follows:

Figure 23 The blocks are shifted according to the above matrix _[17]

Figure 24 Step 4 of the Round Key addition

The final step is Add Round Key. This simply Xor’s in the subkey for the current round. The extra final round omits the Mix Column step, but is otherwise the same as a regular round. Because it begins and ends with an ARK (Add Round Key) step, there is no wasted unkeyed step at the beginning or end. Below is a graphical representation of the Rijndael Algorithm;

Figure 25 The structure of Rijndael

Serpent

Developed by Eli Biham, SERPENT is a "straight-through" algorithm (or substitution-linear transformation network), rather than a Feistel cipher. It consists of 32 rounds.

It is simple in appearance, using plain 4-bit-wide S-boxes without additional inputs and standard computer logic operations. It also includes an initial permutation and an inverse initial permutation, so that the S-boxes can be implemented with logic operations instead of table lookups; this is possible because the eight S-boxes used by the algorithm are used in sequence rather than in parallel_[25].

The round function consists of three layers: the key XOR operation, 32 parallel applications of one of the eight specified 4x4 S-boxes, and a linear transformation. In the last round, a second layer of key XOR replaces the linear transformation.

In a normal round, the first step is to XOR the current round's subkey with the 128-bit-wide block (the key XOR operation)._[27] Then, the entire block is transformed, nibble (4 bits) by nibble, according to the current S-box for the round. The S-boxes are numbered from S0 to S8, and are used in order repeatedly;

S0 for rounds 1, 9, 17, and 25,

S1 for rounds 2, 10, 18, and 26...

Then, the block goes through a series of mixing operations so that the different nibbles of the block interact.

This process proceeds, in bitslice mode, as follows; for the normal mode described here, this series of steps must be preceded by the inverse of the initial permutation and followed by the initial permutation to be correct. In the final round, the mixing operations are omitted. After the 32nd round, the bits are subjected to what is called in SERPENT the final permutation, which is the inverse of the initial permutation _[25].

Figure 26 The structure of one round in Serpent

Twofish

Developed by Bruce Schneier as a successor to his 64-bit Blowfish block cipher. Schneier described it very succinctly as follows;

"Specifically, Twofish is a 128-bit block cipher that accepts a variable-length key up to 256 bits. The cipher is a 16-round Feistel network with a bijective F function made up of four key-dependent 8-by-8-bit S-boxes, afixed 4-by-4 maximum distance separable matrix over GF(28), a pseudo-Hadamard transform, bitwise rotations, and a carefully designed key schedule."

Twofish has a block size of 128 bits, and accepts a key of any length up to 256 bits. Twofish is fast on both 32-bit and 8-bit CPUs and in hardware. It may have in network applications where keys are changed frequently and in applications where there is little or no RAM and ROM available. _[28]

It does not use a classic Feistel structure but rather a slightly modified version using 1-bit rotation. Twofish uses 40 32-bit subkeys. The first eight are used for whitening; four at the beginning and four at the end are XORed with the entire block. Each round uses two of the remaining 32 subkeys, and so Twofish has sixteen rounds. The 128-bit block is divided into four 32-bit quarters. In Each round two of these 32 bit words is used as the input into the f function.

Each is broken up into Four Bytes. The four bytes are sent through four different key dependent 8x8 SiBoxes. The Four Output Bytes are combined into 32-bit words by use of what is known as a "Maximum Distance Separable Matrix". The 2 32 bit words are combined using the following;

Figure 27 Pseudo-Hadamard Transform

Then they are added to the two round subkeys and finally Xored with the right hand side of the text.

Figure 28 The structure of Twofish

Comparison of the Finalists In Hardware

The NIST who where conducting the competition for the New Encryption standard enlisted The National Security Agency (NSA) to provide hardware performance measurements to aid NIST in their selection of the AES algorithm.

Round 2 consisted of tests to compare the hardware efficiency of the finalists. Round 1 was a mainly software based comparison. To cover a wide range of potential hardware applications, the NSA, used two distinct architectures, small area iterated version and a high speed, large area pipelined version._[29] The standard design approach consisted of creating hardware models using VHDL.

The NSA detailed all the findings in a document entitled Hardware Performance Simulations of Round 2 Advanced Encryption Standard Algorithms_[29]. This document is totally impartial and only gives fact about the algorithms and how they performed in the various hardware tests. I feel that this quote from the document sums up the NSA’s impartiality;

"It should be emphasized that any data point based on a single is a relatively narrow view of the algorithm’s overall performance or rating. For this reason, there was no attempt to rank algorithms in order. Rather, it is left to the cryptographic community to establish a consensus of the most important parameters – in combination or alone – and to draw appropriate conclusions from the data provided herein."_[29]

The Document is very comprehensive and testing the algorithms in a variety of ways;

Areas of comparison

Area

The two varieties of architectures, iterative and pipeline will be on the extremes of area with the iterative being the smallest, and the pipelined being the largest.

Throughput

In most cases, throughput is directly proportional to area; as area decreases, throughput decreases. Therefore Iterative architecture has a much lower throughput then the pipeline version. Throughput is reported for both architectures.

Transistor Count

Transistor count is a more specific measure than area and is often more useful. The transistor count can be useful to estimating programmable logic implementations since these devices typically report the number of useable gates.

Input/Outputs (I/O) Required

With the goal of consistency among algorithms, the NSA fixed the I/O for all algorithms.

Key Setup Time

The key setup time refers to the amount of time needed before subkey expansion is ready to execute. Some algorithms use the user-supplied key directly in the subkey expansion thereby reducing the key setup time to zero. Others require some pre-calculation or translation of the key prior to subkey expansion steps.

Algorithm Setup Time

None of the evaluated algorithms contained an algorithm setup time.

Time to Encrypt One Block

This parameter will address minimum latency times for each of the algorithms. The time to encrypt one block, measured in nanoseconds, is a function of two parameters: the worst-case path delay between any two registers, and the number of rounds in the algorithm.

Time to Decrypt One Block

As above, this parameter will address minimum latency times for each of the algorithm submissions. Decryption does not always require identical processing as encryption. Therefore, the time required to decrypt one block is reported.

In Appendix A I have reproduced the summary of each of the implementations of the five final algorithms. There are two tables for each algorithm (exception Serpent) one for the standard 128 bit key size and a second table detailing the result of the 3 in 1 algorithm which is an implementation of the algorithm that can use 128,129 or 256 bit keys.

Below I have reproduced two tables giving short details on each of the algorithms summarizing the results and performance metrics. These comparison values are given only for the combined key size design, which implements a selectable 128-bit, 192-bit, or 256-bit key in the same device as mentioned above. But they detail the two hardware architectures, ie. Iterative and Pipeline.

Figure 29 Iterative Summary _[29]

Figure 30 Pipeline Summary _[29]

Comparison of algoithms

The NSA report contains multiple disscussions comparing the 5 algorithms under the above criteria for both Hardware architectures and for both the 3-in-1 implementations and the single 128 bit implementation. After the study of the graphs and text within it, it becomes apparent that Mars is clearly the least efficient Algorithm. (See Appendix A)

Mars

Mars uses six times more area then RC6, Twofish and Serpant, for the iterivitve implementation and almost 3 times more for the Pipelined version. It’s transistor count is also way above the average. Its throughput for the Pipeline implementation is competitive with Twofish and RC6, but is particularly poor for the Iteravtive hardware. It consistantly has the worst times to encrypt and decrypt a block and it’s key setup time is extremly poor in respect to all the algorithms bar RC6.

RC6

I would Rank RC6 as the second worst algorthim, it does preform very well in both architectures in regard to having the smallest area and it also has a very competitive transistor count. It becomes very noticeable especially in the iterative hardware that its keysetup times and the time taken to encrypt a block are well below par. If you compare its throughput in the Iterivative version it is approximately ¼ of Rijndael score and the same is true in the Pipeline Arcihtecture. Where Serpant has a throughput of 8030 and RC6’s throughput is only 2171.

Twofish

Twofish is similar to RC6 in that it does particularly well in regard to area and transistor count. It also has on average the best keysetup times. It only loses out on throughput and time to encrypt/decrypt a block which are very closely linked in most algorithms.

Serpant

Serpant is a very strong algorithm. It was initally chosen as a back up algorithm until the idea of a backup algorithm was disregauarded. It performes brilliantly in the pipeline hardware beating Rijndael in everything except time to encrypt/decrypt a block. Unfortunately for Serpant, Rijndael excels in the Iterative hardware, with excellent throughput, keysetup time and time taken to encypt or decrypt a block That said, Serpant uses half the area used by Rijndael and just over 1/3 of the transistors.

Rijndael

Below I have reproduced the NIST’s reasons for selecting Rijndael as the new AES standard;

"When considered together, Rijndael's combination of security, performance, efficiency, ease of implementation and flexibility make it an appropriate selection for the AES. Specifically, Rijndael appears to be consistently a very good performer in both hardware and software across a wide range of computing environments regardless of its use in feedback or non-feedback modes. Its key setup time is excellent, and its key agility is good. Rijndael's very low memory requirements make it very well suited for restricted-space environments, in which it also demonstrates excellent performance. Rijndael's operations are among the easiest to defend against power and timing attacks._[26]"

Figure 31 The votes for the round 2 finalists at the last AES conference

Conclusion

The NIST have selected an extremely strong algorithm from five fine finalists, Serpant and Twofish being two excellent performers and equally worthy algorithms. To give an idea of the strength of Rijndael assume the "DES Cracker" mentioned in the previous chapter recovered a DES key in a second (i.e., try 2⁵⁵ keys per second), then it would take that machine approximately 149 thousand-billion (149 trillion) years to crack a 128-bit AES key. To put that into perspective, the universe is believed to be less than 20 billion years old._[27]it must also be remembered that the same implementation of Rijndael will also function with 192-bit and 256-bit keys. AES may last as long as DES, 20 years, assuming a weakness is not discovered in the algorithm. It should also be noted that it has been scrutinised for over a year by the cryptographic community.

The GSM Encryption Algorithms A3 A5 and A8

A3, The MS Authentication Algorithm& A8, The Voice-Privacy Key Generation Algorithm

Authentication involves a challenge-response mechanism between two functional entities, the SIM in the Mobile Station (MS) and the Authentication Centre (AuC). When an account is initially generated for a new subscriber, a secret key (K_i), unique to that account is produced, one copy of which is stored in the SIM and the other in the AuC. When the MS notifies the MSC of its presence, the local VLR (Visitor Location Register) contacts the mobile unit's HLR (Home Location Register) and transmits the VLR's Location Area Identifier (LAI) and the mobile's IMSI. The HLR asks its local AuC for a set of triplets containing:

A 128 bit random number (RAND) for use as a challenge.

A 32 bit signed response (SRES) where SRES = A₀₃(Ki, RAND) where Kiis the subscriber's key.

A 64 bit ciphering key K_c where K_c = A₈(K_A, RAND).

Figure 32 Mobile station authentication

The triplets are forwarded to the requesting VLR and each triplet is used only once for the authentication of the MS. The VLR transmits the challenge RAND to the mobile terminal, which uses the random number, in conjunction with the subscriber's secret key and the authentication algorithm A3 to generate SRES that is sent back to the VLR. If the received SRES is the same as that held by the VLR, the subscriber is authenticated. One session key, Kc, is used until the MSC decides to authenticate the MS again. This might take days _[13]. A₃ and A₈ are both unpublished, key-dependant, one-way hash functions.

Figure 33 A3 & A8’s Inputs and responses

COMP 128

Nearly every GSM operator in the world uses an algorithm called COMP128 for both A3 and A8 algorithms (As of April 1998 only five GSM Networks where known not to be using COMP128 _[43]). COMP128 generates both the SRES response and the session key, Kc, on one run. It was given as an example algorithm that could be used in the GSM System Security Study leaked document _[8].

I will briefly run though how Comp128 generates its output ( I have reproduced the algorithm in C code in Appendix C as leaked from The GSM System Security Study and with the reverse-engineering fixes) .

Figure 34 COMP128 calculation

Initially the RAND is loaded into a 32-byte register followed by Secret Key. So the first 128bits are the key and the rest is RAND. It consists of five P-Boxes. The 1st Box consists of 512 entries, the 2^nd 256, the 3^rd128, the 4^th 64 and final the 5^th consists of 32 entries. The Algorithm consists of 5 rounds and during each round the register is altered 8 times by a selection of the above S-boxes. The algorithm therefore consists of 40 substitutions. This is followed by 128 permutations so every bit is transposed to a new location.

The first four bytes of the register are sent as the SRES. According to the leaked GSM document "simoutput[4..11] is Kc" which is 8 bytes which would be a 64 bit Key. On inspection of the C code (Appendix C) you can see that that Kc is bits [74..127] of the COMP128 output, followed by 10 zeros. In other words, A5 is keyed with only a 54 bits key. This represents a deliberate weakening of the key used for voice privacy by a factor of over 1000.

Description of A5 Stream Cipher

The over-the-air privacy of GSM telephone conversations is protected by the A5 stream cipher. This algorithm has two main variants: The stronger A5/1 version is used by about 130 million customers in Europe, while the weaker A5/2 version is used by another 100 million customers in other markets. The approximate design of A5/1 was leaked in 1994, and the exact design of both A5/1 and A5/2 was reverse engineered by Briceno from an actual GSM telephone in 1999_[6]. This version was later verified by the GSM organisation _[5]. I will be describing A5/1, which is used in the European market. AS A5/2 the American variant has been deemed totally insecure having been crack in 2¹⁶iterations_[10].

A GSM conversation is sent as a sequence of frames every 4.6 millisecond. Each frame contains 114 bits representing the digitized A to B communication, and 114 bits representing the digitized B to A communication. A new session key Kc can encrypt each conversation. For each frame, Kc, 64-bit is mixed with a publicly known frame counter Fn, 22-bit and the result serves as the initial state of a generator, which produces 228 pseudo random bits. These bits are XORed by the two parties with the 114+114 bits of the plaintext to produce the 114+114 bits of the ciphertext.

A5/1 is built from three short linear feedback shift registers (LFSR) of lengths 19, 22, and 23 bits, which are denoted by R1; R2 and R3 respectively. The rightmost bit in each register is labeled as bit zero. The taps of R1 are at bit positions 13,16,17,18; the taps of R2 are at bit positions 20,21; and the taps of R3 are at bit positions 7, 20,21,22 (see Figure 35).

When a register is clocked, its taps are XORed together, and the result is stored in the rightmost bit of the left-shifted register, bit zero. The three registers are maximum length LFSR's with periods 2¹⁹ -1, 2²² - 1, and 2²³-1, respectively. They are clocked in a stop/go fashion using the following majority rule: Each register has a single "clocking" tap (bit 8 for R1, bit 10 for R2, and bit 10 for R3); each clock cycle, the majority function of the clocking taps is calculated and only those registers whose clocking taps agree with the majority bit are actually clocked.

Figure 35 A5 keystream generator

The process of generating pseudo random bits from the session key K and the frame counter F_n is carried out in four steps:

-- The three registers are zeroed, and then clocked for 64 cycles (ignoring the stop/go clock control). During this period each bit of K (from lsb to msb) is XORed in parallel into the lsb's of the three registers.

-- The three registers are clocked for 22 additional cycles (ignoring the stop/go clock control). During this period the successive bits of F_n (from lsb to msb) are again XOR'ed in parallel into the lsb's of the three registers. The contents of the three registers at the end of this step are called the initial state of the frame.

-- The three registers are clocked for 100 additional clock cycles with the stop/go clock control but without producing any outputs.

-- The three registers are clocked for 228 additional clock cycles with the stop/go clock control in order to produce the 228 output bits. At each clock cycle, one output bit is produced as the XOR of the msb's of the three registers

See Appendix B for a C implementation of A5 as verified by the GSM organization.

Attacks

COMP128

In April 1988 Ian Goldberg and Marc Briceno _[44] showed that COMP128 is cryptographically weak, and it is not difficult to break the algorithm and clone GSM digital phones. They formed a number of specially chosen challenges and query the SIM for each one. By analyzing they were able to determine the 128bit value of the Ki. This could then be used to clone another mobile and the billing would be attributive to the stolen phone.

To mount the attack you need physical access to the target SIM, an off-the-shelf smartcard reader, and a computer to direct the operation. The attack requires one to query the smartcard about 150,000 times; smartcard readers can issue 6.25 queries per second, so the whole attack takes 8 hours. Very little extra computation is required to analyze the responses.

The attack exploits a lack of diffusion: there's a narrow ``pipe'' inside COMP128. In particular, bytes i,i+8,i+16,i+24 at the output of the second round depend only on bytes i,i+8,i+16,i+24 of the input to COMP128. Bytes i,i+8 of the COMP128 input are bytes i,i+8 of the key, and bytes i+16,i+24 of the COMP128 input are bytes i,i+8 of the challenge input.

Now we ``probe'' the narrow pipe, by varying bytes i+16,i+24 of the COMP128 input and holding the rest of the COMP128 input constant. Since the rounds are non-bijective, you can hope for a collision in bytes i,i+8,i+16,i+24 of the output after two rounds. The birthday paradox guarantees that collisions will occur pretty rapidly (since the pipe is only 4 bytes wide); collisions in the narrow pipe can be recognized, since they will cause a collision in the output of COMP128 (i.e. the two authentication responses will be the same); and each collision can be used to learn the two key bytes i,i+8 with a bit of analysis of the first two rounds (i.e. perform a ``2-R attack'', in the terminology of differential cryptanalysis).

As stated, this would require 2^{4*7/2 + 0.5} = 2^{14.5} chosen-input queries to COMP128 to learn two key bytes (since each of the four bytes of output after the second round are actually only 7-bit values), and thus would require 8 * 2^{14.5} = 2^{17.5} queries to recover the whole 128-bit key Ki. However, the authors used optimization to get this figure down.

This attack is not particularly novel a lot of research has been done in this area _[46]ie. cryptographic hash functions of a FFT-like structure. COMP128 is a hash function of this design.

An Organization Called Echelon provides all the information necessary to clone a GSM phone, including software and hardware providers, http://www.echelon.cx.

COMP128 authentication works in other countries as well, because the local network asks the HLR of the subscriber's home network for the triplet (RAND, SRES, Kc). Thus, the local network does not have to know anything about the algorithms used. With 80 million GSM users, fixing flaws in such a widely fielded system is likely to be quite costly. A new authentication algorithm would have to be selected. Then new SIMs would have to be programmed with the new algorithm, and distributed to the 80 million end users. Finally, a software upgrade may be required for all authentication centres.

The Berkeley group

The Berkeley group published and analysed A5/2 in August 1999. As the weaker of the two voice-encryption algorithms, it proved to be very weak. It can be broken in real-time without any trouble; the work factor is only 2¹⁶ for a 64 bit key algorithm _[44].Its weakness has been blamed on the NSA’s hand in it’s design. They introduced a 4^th LSFR, which decide on the clocking of the other 3 registers. This register is only 16 bits long hence the weakness of the algorithm.

Briceno

Briceno _[6] found that in all the deployed versions of the A5/1 algorithm, the 10 least significant bits of the 64 key bits were always set to zero. Thereby reducing the complexity of exhaustive search from 2⁶⁴to 2⁵⁴.This flaw is essentially the fault of the implementation of the COMP128 algorithm see comment in Appendix C for more Details.

Anderson and Roe

Anderson and Roe _[10] proposed a divide-and-conquer attack based on a known-plain-text attack. The attacker guessing the 41 bits in the shorter two registers and tries to determine the initial states of the 23 bits longer R₃ register from known keystream sequences. The attacker needs to know 64 successive keystream bits that can be retrieved if the attacker knows some cipher text and the corresponding plain text _[12]. This depends largely on the format of the GSM frames sent back and forth. The GSM frames contain a lot of constant information, e.g. frame headers. The required 64 bits might not always be known, but 32 to 48 bits are usually known, sometimes even more keep in mind that the attacker needs only one 64-bit plain text segment. This would be a 2⁴⁰ attack, if the clocking of the first two registers were not dependent on the third register. Because the middle bit of the third register is used for clocking, we have to guess about half of the bits in the third register between the clock bit and the LSB as well. This fact increases the time complexity from 2⁴⁰ to 2⁴⁵ _[10].

J. Golic

J. Golic _[12] has proposed another divide-and-conquer attack based on the same assumptions with the average complexity of 2^40.16. Golic showed that only 2^62.32 internal states could be reached from the 2⁶⁴ initial states _[12]. Based on this assumption, he describes how to obtain linear equations by guessing n bits in the LSFRs. By solving these linear equations, one could recover the initial states of the three LSFRs. The complexity of solving the linear equations is 2^41.16. On average, one would resolve the internal state with 50 per cent chance in 2^40.16 operations. However, each operation in this attack is much more complicated then the Anderson and Roe attack, as it is based on the solution of a system of linear equations. In practice, this algorithm is not likely to be faster than the previous attack on a PC, but would be faster in Hardware _[5].

Golic also proposed a Time-Memory Trade-Off Attack based on the Birthday Paradox (see glossary) in the same paper _[12]. The idea behind this attack is to recover the internal states of the three LSFRs at a known time for a known keystream sequence corresponding to a known frame number, thus reconstructing the session key, Kc. He concludes that it is possible to find the A5/1 key in 2²² probes into random locations in a precomputed table with 2⁴² 128bit entries. Since such a table requires a 64-terabyte hard disk, the space requirement is unrealistic. Alternatively, it is possible to reduce the space requirement to 862 gigabytes, but then the number of probes increases to O(2²⁸). It takes Approximately 6 milliseconds to Access the fastest commercially available PC hard drive. This would mean that it would take almost 3 weeks to do the necessary probes _[5].In addition, this tradeoff point can only be used to attack GSM phone conversations, which last more than 3 hours, which again makes it unrealistic.

Biryukov, Shamir, Wagner

Biryukov, Shamir, Wagner detailed two attacks in a paper titled "Real Time Cryptanalysis of A5/1 on a PC"_[5] and several variants. They describe a Biased Birthday attack, which requires 2⁴² preprocessing steps and 292 GB of Hard drive space, yet with two minutes of data can extract the key in 2 second. A variant requires 2⁴⁸Preprocessing steps and 146 GB of Disk space where the conversation need only be 2 second long and can generate the key in several minutes.

The idea behind the Bias Birthday attack is to search through each of the frames of the conversation for 2 minutes. They are looking for an occurrence of a predefined string of 16 bits alpha (e.g. 10000…0) from bits number 101 to 277. During the first 2 minutes of the conversation approximately (2^-16* 177*120*1000/4.6) 71 states containing alpha should occur they are called "green-states". Stored on the hard drives are chosen initial states called "red states". A state is coloured red if its sequence of output bits contain alpha between steps 101 and 277. The state of A5 contains 64 bits, but they keep only special states and thus we can encode them efficiently with shorter 48 bit names. From a red state there is little computation to product all of it’s green states, and there is also very little computation in generating all of the possible keys that formed this red state (Red-states are the states after the 100 mixing clocks).

When a green state is found in the conversation, you work out where alpha occurred in the green state and then probe the hard drive to find the red state that would product alpha at this time. You then generate all of that red state’s green states and if one is the same as the one in the conversation you generate all that red states possible key. Then you test all the key to see if one is correct and if so you have the session key. If not you wait for the next occurrence of alpha.

Figure 36 Trees of different sizes

The crucial point about the bias birthday attack is that in A5 there is a huge variance in the weight of the various red states. They found that the weight of about 85% of the states was zero, because their trees died out before reaching depth 100. So they optimised the attack in the pre-processing stage by only storing the heaviest tress. Each stored state increases the coverage by 12,500 green states on average, which improved their attack by almost two orders of magnitude.

The Random subgraph attack is very similar only after getting the first occurrence of alpha which usually occurs in the first 2 seconds they try to find the session key on this state alone by storing twice as many red states.

The Attacks I implemented

Computaional Attack on COMP128

After reading extensively about the weakness of COMP128 I decided to attempt to break the Algorithm and comprimise the Secret key Ki. Smart cards have the following structure;

Figure 37 Structure of a Sim card

A smart card is a plastic card, with an embedded computer chip, usually an 8-bit microprocessor, RAM (1/4 of a kilobyte), ROM (6 to 8 kilobytes), and either EPROM or EEPROM ( a few kilobytes). The card has it’s own operating system programs and data, and gets it’s power from the reader.

A communication between the outside world and the card involves the following steps:

1. Activation of the contacts by the smart card reader
2. Resetting of the card by the reader
3. Answer-to-reset by the card
4. Optional selection of a protocol type
5. Processing of successive commands
6. Deactivation of the contacts by the card reader.

C1: Power supply (VCC)	C5: Ground (GND)
C2: Reset (RST)	C6: Programming voltage (VPP)
C3: Clock (CLK)	C7: Input/output (I/O)
C4: Reserved (RFU)	C8: Reserved (RFU)

Figure 38 The contacts and their uses on a Smart card

Input/output involves asynchronous characters transmitted in half-duplex mode. Each character is ten consecutive bits: a start bit, eight data bits, and an even parity bit. A short interval or "guard time" between successive characters allows for synchronization in the transmission.

I bulit the following circuit, to implement a smart card reader;.

Figure 39 The Circuit for a Smart Card Reader._[3]

The Program that this circuit is used with is called "Sim Scan" which was designed by Dejan Kajevic._[3] It allows functional analysis of your GSM SIM smart card. The circuit has no external power source it obtains all of it power from the RS232 cable (Serial Port). It was designed to read ISO 7816 smart cards, GSM smart cards are of this type _[3]. The Program lets you analyse the following on any smart card.

ATR: "Answer To Reset" strings.

CLA: Application class

INS: Instruction code

Files: Any files that are stored on the smart card

Ki: The scerect Key for the COMP128, only applicable to GSM SIM cards.

Some Smart cards may be destroyed buy using the Ki search (e.g. PrePaid cards) as they only have a limited number of runs of the A3A8 algorithm usually 10000 to 65536 times _[2].

Figure 40 Sim Scan asking for the Pin of the card to be cracked

This version of the COMP128 attack implemented in the program is only a simple method that allows key finding in 90% of cases. It is similar but not identical to the method mentioned above. During the work it is possible to interrupt the program by pressing any key. In case of interrupting, a temp file is saved, and later you may continue the analysis from that point.

Figure 41 Smart Card Reader attached to serial port.

Most of the components where readily available from the college stores. The sim card holder and the low powered hex inverter (74HC04) where not available so I sent to Farnell for these an some other components. During Testing I discovered several mistakes and rectified them. But after buliding the circuit to the exact specification discribed above I was still unable to get the circuit to operate. After further testing It seemed that I was not getting enough power to power either the sim card or the hexs inverter both of which needed 5volts. It seem that the designer had mis calculated the voltage drop over the transistors. After using external power to power these two component I was able to get the simcard reader operational.

I obtained an old prepaid sim card and begain an attack on the card. Initially the program seemed very unstable and frequentely halted because it was not ressiving an "answer-to-reset" from the card. When this happened I would have to restart the program fortunately because of the temp file I was able to start from where the program had crashed. After consulting with the technicians in the lab I got the data sheets for some of the components and discovered that the crystal ossilator in the circuit could vary by + or – 20% of it’s printed value. After trying several crystal one seemed to match the baud rate and made the circuit much more stable. It decoded two bytes every hour and a half approximately. After 13 hours the program cracked the sim cards secert Key Ki. Also the option to attack Ki, was the only function that would work in the program all other function caused a fatal error that crashed the program.

Figure 42 Ki found by Sim_Scan

It may be possible to speed up this attack 2 or 3 fold by using a faster crystal oscillator and higher baud rate. A sim card will operate up to a clock speed of approximately 11Mhz _[9]so a baud rate of 23040 bps and a crystal oscillator running at 8.57MHz. Could be tried or if possible a baud rate of 2880bps and a 10.71 MHz crystal.

Conclusion on Comp128 Attack

COMP128 is particularly weak; all the information need to extract Ki form a sim card was readily available on the internet. David Wagner is quoted as saying "There's no way that we would have been able to break the cryptography so quickly if the design had been subjected to public scrutiny, Nobody is that much better than the rest of the cryptography research community _[44]." Although the secret key can be compromised in approximately 8 hours you do need physical access to the card.

Over-the-air cloning

It has now be come clear that to a well equipped attacker an over-the-air attacks is possible. GSM experts reported that a number of aspects of the GSM protocols combine to make it possible to mount the mathematical chosen-input attack on COMP128, if one can build a fake base station. Such a fake base station does not need to support the full GSM protocol, and it may be possible to build one with an investment of approximately $10k _[44].

Some technical expertise is probably required to pull off the over-the-air cloning attack, and the attack requires over-the-air access to the target handset for a relatively long period of time. To get around this problem The attack can also be performed in parts: instead of performing an eight-hour attack, the attacker could question the phone for twenty minutes every day on the victim's way to work. Once the SIM is cloned, the SIM-clone is usable until the subscriber gets a new SIM, which in practice does not happen very often. Therefore, over-the-air cloning must be considered a very real threat, which should not be ignored. Finally there has been some talk about a new version of COMP128, COMP128/2 but to implement this algorithm every Sim card would have to be replaced. AlsoRoss Anderson Eli Biham and Lars Knudsen described the merits of Serpent for use on a limited 8-bit processor as is used in smart cards _[45]. In the document the discribe why Serpent is suitable for the restrictive environment of a smart card where speed is of importance but ram and processing power is very limited. Other stronger one way hash functions could also replace COMP128.

Brute Force attack on A5, known plaintext.

Although from the last chapter it is clearly showed that there are much more efficient attacks on A5 then a brute force. I attempted some exhaustive search attacks to see A5’s resilience to such an attack. A brute force attack is an attack where each possibility is tried until success is obtained. Typically, a ciphertext is deciphered under different keys until plaintext is recognised. Unfortunately with A5 the plaintext is simple a group of 114 bits so recognising a correct deciphered text is difficult.

Figure 43 GSM TDMA structure of the normal burst

Each timeslot within a TDMA frame contains modulated data referred to as a "burst". There are five burst types normal, frequency correction, synchronization, dummy, and access bursts. The normal burst is composed of a 3-bit start sequence, 116 bits of payload, a 26-bit training sequence used to help counter the effects of multi-path interference, a 3-bit stop sequence required by the channel coder, and a guard period Two bits from the 116-bit payload are used by the Fast Associated Control Channel (FACCH) to signal that a given burst has been borrowed, leaving a total of 114 bits of payload. Contrary to some papers I have read only information in the Payload section is encrypted meaning that we cannot use the training sequences as source of plaintext. During about the first 1/10th of a second of a call the coder will encode silence. A very rough estimate is 13000 bps * 0.1 s = 1300 bits of known plaintext _[6]. So to attempt to break the conversation your ciphered text will come from the first 24 frames.

Initially I implemented a brute force attack on A5. I took the verified C code for A5 as reverse engineered by Marc Briceno, Ian Goldberg, and David Wagner and generated the first 328 random bits for the key 12 23 45 67 89 AB CD EF and the frame no.134. Then I wrote a loop to generate the output of all keys for a fixed frame no. Starting with the key value 00000000000000. For Each key the program generates the initial 100 bits, which are disregarded, and then the 114 bits for the upstream and also the 114 bits for the downstream. Then the program compares these 228 bits against the previously generated bits for the key, and if they match the session key is compromised.

Generate the first byte then compare

The second Brute force attack was almost Identical to the first. The only difference came in the comparisons. After generating the 100bits that are disregarded I generated the first byte of the upstream only. I then compared this against the known good output. If they were the different I disregard the key at this stage. There by saving the time need to generate the remaining 220 bits. If the first byte generated was the same as the known good output I would generate the remaining 220 bits of the output and compare then against the previously generate output. This improves the efficiency of the program making it approximately 3 times fast then the initial attack.

Reduce the output

The next improvement I made was from reading J. Golic, Cryptanalysis of Alleged A5 Stream Cipher _[12]. The internal state size of A5 is 64 bits. To generate a plaintext attack we only need to know 64 successive bits not the full 228 bits. By reducing the number of bits to generate from 328 to 164 I will reduce the amount of computation necessary. Unfortunately this will not increases the processing speed drastically as only keys that have the same first byte as the session key will benefit from this improvement in the attack. I have reproduced the code for this attack in Appendix E.

C optimisation

There are numerous methods of optimisation that may be implemented. Techniques such as loop unrolling and strength reduction, it tends to add apparent complexity and hides the core functionality of the program. Function calls for example are a good thing but a function call interrupts an optimiser’s train of thought in a drastic way. Any references through pointers or to global variables need to be saved /restored across the function call. Local variable that have had their address taken and passed outside the function also suffer. There is also some overhead to the function call itself as the stack must be manipulated and the program counter altered. I implemented some of these techniques in conjunction with compiling the program in the release and not the Debug version to increases the speed 5 fold. Although the code did bulk up severely.

Finding the best compiler

Finally I tried several compilers from commercial to freeware to improve the programs speed by a further 25%.

Results and Conclusions

No.	Description	Key-space	Duration of Search	Keys sec^-1	50% of total keyspace
1	Basic Brute force attack	0x444000	17H 56m	1110	257312 yrs
2	Generate one byte and compare	0x444000	6H 27m	3089	92462 yrs
3	Reduce output from 228 to 164 bits	0x444000	6H 20m	3110	91838 yrs
4	C-Optimisation	0x444000	1H 38m	11931	23939 yrs
5	Most efficient Compiler	0x444000	1 H 14m	15950	17907 yrs

Figure 44 Results of a key searches on a single computer

All results listed in this section were attained using a computer with 128Mb of RAM and a 450MHz Pentium II processor running Windows NT. I ran each of the attacks 3 times to remove any deviations. As you can see I reduced the attack almost 260 thousand years to just below 18 thousand years. Although this is almost 14 times faster the attack is still totally unrealistic. Any of the attack describe in the pervious section is much more effective.

Figure 45 My fastest attack took 4487 seconds to generate the key space

The attacks where on a key space of 0X 444 0000 (71565312 decimal) keys. I have listed the main steps of the improvement of the attack in the box above. It should also be noted that an increases in processing power will drastically increases the number of keys generated per second but that an increases in RAM does not effect key generation. As you can see from figure 32 above the processor is working at 100% but less then 6Mbytes of RAM is being used.

Figure 46 The acceleration of the key search

Hardware base attack

I have obtained the verilog code to implement the A5 algorithm (See Appendix D). So it would not be a giant leap to load this onto a Xilinx chip and start testing keys. Hardware attacks are thousands of times faster that software attacks. A chip costing $10 can be nearly 1000 times quicker that the average home PC running an optimised executable coded in C _[1]. A hardware-based attack is the only way to make it feasible to crack A5/1 using a Brute Force attack with currently available computer power. Even porting the attack to a distributed Environment would only marginally decreases the time needed to recover a session key.

Figure 47 Processing power and Ram being used during an attack

Simulation of the GSM Environment

After implementing the attacks above I had all the information necessary to roughly simulate the GSM Environment. This has all the interaction of a really GSM environment but the interaction is through file rather the over the air. Also I am only transmitting the encrypted voice and not a full frame. Initial I took Ki as extracted from the Sim card.

Figure 48 The GSM environment that was implemented

Then I took a random number as would be generated in the Base Station RAND. These two 128-bit numbers are put into text files Ki.txt and RAND.txt respectively. I use them as input to COMP128 and it produces Kc.txt the session key and SRES.txt the reply to the Base Stations Random challenge. Then Kc.txt and the Plaint.txt are ran through A5 to generate a ciphered text that would be sent to the base Station. This ciphered text is used in conjunction with Kc.txt to emulate the decryption of text at the base station to produce Plainout.txt. Although it hardly emulates all the protocols need to implement the GSM service, it does total implement the interaction between the two algorithms use in GSM. Namely the generation of A5’s session key by COMP128 from RAND and Ki. But also the synchronisation of the two keystream generators which emulate A5 in both the Mobile Station and the Base Sation.

Conclusion

In this paper I have shown many flaws in the GSM cryptographic algorithms. I was able to compromise COMP128 and showed numerous flaws in A5 including the fact that it is really only 2⁵⁴-bit encryption. I believe that these have come from the way that these algorithms where developed in secrecy. In time all were leaked or reverse engineered, versions of A3 A8 A5/1 and A5/2 are all freely available. These algorithms where developed in 1989 – 90 and are just over ten years old. In contrast DES was developed in total openness and released to the entire cryptographic community. It was designed in the mid 70’s and remained a government standard for over 20 years. When it was replaced it was replaced by triple DES simply because it’s key length was too short but everything else about the algorithm is still strong. The AES has been released in a similar fashion to DES, and is presently being investigated for weaknesses on many fronts.

Essentially the telecom companys need to understand that "security through obsucrity" is not the way to develop cryptographic systems. There is also the question of government industrys input into the deliberately weakning of these algorithms. The National Security Agency is known to have pressured the analogous U.S. standards body to weaken voice privacy. Unless consumers demand better, the situation is unlikely to change and we will still only get 54 bit security when we were sold 64 bit, as happened with A5. Both voice-encryption algorithms are flawed, but not obviously so. The attacks on both A5/1 and A5/2 make use of subtle structures of the algorithm, and result in the ability to decrypt voice traffic in real time on average computer equipment. Another point is that if government or police agencies need to acquire legal phone taps they can tap the phone lines at the exchanges where they are not encrypted. The only reason to weaken this system is for "illegal" access. Only wiretaps lacking a court authorization need over-the-air intercepts.

As we look towards 3G you would have thought that the telecom industries would have learnt from their mistakes but as you can see from the following quote from Marc Briceno they have not;

"The GSM MoU still has not recognized the dangers inherent in a secret design process; they have not opened up the design to public review, and in fact they apparently will be designing the next-generation GSM standard (G3) again using this flawed design process. With this decision, we can only expect that further flaws will be discovered in the future. I remain sharply critical of the GSM industry's approach to cryptographic design: it is, in my opinion, irresponsible to consumers and poor scientific practice to boot."

Room for improvement

The basic ideas behind A5 are good. It is very efficient in the limited space and hardware applications that it is used in. But all of it’s weaknesses stem from the it’s short registers. Variants of A5 with longer shift registers and denser feedback polynomials should be secure. A5 only uses LFSR where other stream ciphers like RC4 and SEAL are designed more like block ciphers with nonlinear transformations and large S-box etc. Or a block cipher that has been tried and tested could be use in OFB (Output-FeedBack) or CFB (Cipher-FeedBack) mode to get a stream cipher. We have seen how well Serpent Twofish and Rijndael perform in small areas.

Figure 49 Cipher-FeedBack mode

In CFB mode, data can be encrypted in units smaller that the block size. The example above is working on 8-bits. You can encrypt data one bit at a time using 1-bit CFB, which would almost emulate A5, but it seems a lot of encryption for just one bit. Essentially this mode forces a block cipher to be a stream cipher, by using the part of the ciphered text as a function of random number generator. Initially the queue is filled with random data. OFB is similar except that n-bits of the previous block (ki) are moved into the right-most position of the queue instead of Ci. A5/1 could also be replace by PIKE, RC4 or SEAL

References

1. Bruce Schneier, Applied Cryptography Second Edition, 1996, p 4,189,197-198

2. Chaos Computer Club http://www.ccc.de

3. Dejan Kaljevic the author of sim-scans Web page http://www.net.yu/~dejan/

4. The GSM System for Mobile communication, Michel Mouly and Marie-Bernadette Pautet, 1992.

5. Alex Biryukov, Adi Shamir, David Wagner, Real Time Cryptanalysis of A5/1 on a PC, April, 2000, presented at the Fast Software Encryption Workshop 2000 New York City

6. M. Briceno, I. Goldberg, D. Wagner, A pedagogical implementation of A5/1, http://www.scard.org, May 1999.

7. Mobile Communications, Jochen Schiller, 2000

8. GSM System Security Study,10-1617-01, 10^th June 1988 by RACAL RESEARCH LTD.,READING, BERKS. RG2 0SB, ENGLAND. http://jya.com/gsm061088.htm

9. HIP Smartcard webpage - http://cuba.xs4all.nl/~hip/

10. R. Anderson, M. Roe, A5, http://jya.com/crack-a5.htm, 1994.

11. S. Babbage, A Space/Time Tradeoff in Exhaustive Search Attacks on Stream Ciphers, European Convention on Security and Detection, IEE Conference publication, No. 408, May 1995.

12. J. Golic, Cryptanalysis of Alleged A5 Stream Cipher, proceedings of EUROCRYPT'97, LNCS 1233,pp.239{255, Springer-Verlag 1997.

13. Lauri Pesonen GSM Interception 21.11.1999

http://www.tml.hut.fi/Opinnot/Tik-110.501/1999/papers/gsminterception/netsec.html

14. Carl H. Meyer & Stephen M. Matyas, Cryptography, 1982 John Wiley & Sons, Inc.

15. http://www.it.murdoch.edu.au/~hiew/b227/lectures/Topic7-Security.html

16. http://www.math.ohiou.edu/~qvu/crypto/9.html

17. John Savard's Rijndael page http://home.ecn.ab.ca/~jsavard/crypto/co040801.htm

18. NIST AES press release 9 Aug 1999, Phillip Bullman

20. IBM’s submission to the AES Board, September 22 1999.

21. Detailed discribtion of Mars by crypto http://home.ecn.ab.ca/~jsavard/crypto/co040806.htm

22. Mars’s home page at IBM http://www.research.ibm.com/security/mars.html

23. Twofish’s home page http://www.counterpane.com//twofish.html

24. Discribtion RC6 by crypto http://home.ecn.ab.ca/~jsavard/crypto/co040804.htm

25. Discribtion of Serpent by crypto http://home.ecn.ab.ca/~jsavard/crypto/co040803.htm

26. AES Fact sheet from the NIST’s WebPages. http://csrc.nist.gov/encryption/aes/aesfact.html

27. Report on the Development of theAdvanced Encryption Standard (AES), By the National Institute of Standards and Technology, October 2, 2000

28. Dr. Dobb's Journal December 1998 http://www.ddj.com/articles/1998/9812/9812b/9812b.htm

29. Hardware Performance Simulations of Round 2 Advanced Encryption Standard Algorithms by Bryan Weeks, Mark Bean, Tom Rozylowicz, Chris Ficke, National Security Agency

30. What is Differential Cryptanalysis? RSA’s FAQ No. 58

http://www.iks-jena.de/mitarb/lutz/security/cryptfaq/q58.html

31. What is Linear Cryptanalysis? RSA’s FAQ No. 59

http://www.iks-jena.de/mitarb/lutz/security/cryptfaq/q59.html

32. M. Matsui and A. Yamagishi. A new method for known plaintext attack of FEAL cipher. In Advances in Cryptology - Eurocrypt '92, pages 81-91, Springer-Verlag, 1992.

33. M. Matsui. Linear cryptanalysis method for DES cipher. In Advances in Cryptology - Eurocrypt '93, pages 386-397, Springer-Verlag, 1993.

34. S.K. Langford and M.E. Hellman. Differential-linear cryptanalysis. In Advances in Cryptology - Crypto '94 , pages 17-25, Springer-Verlag, 1994.

35. B.S. Kaliski Jr. and M.J.B. Robshaw. Linear cryptanalysis using multiple approximations. In Advances in Cryptology - Crypto '94, pages 26-39, Springer-Verlag, 1994.

36. RSA FAQ – DES http://www.rsasecurity.com/rsalabs/faq/3-2-1.html

37. M.E. Hellman, A cryptanalytic time-memory trade off, IEEE Transactions on Information Theory (1980), 401-406.

38. M.J. Wiener, Performance Comparison of Public-Key Cryptosytstems, CryptoBytes (1) (Summer 1998).

39. E. Biham and A. Shamir, Differential cryptanalysis of the full 16-round DES, Advances in Cryptology - Crypto '92, Springer-Verlag (1993), 487-496.

40. RSA FAQ - What are the most important attacks on symmetric block ciphers?

http://www.rsasecurity.com/rsalabs/faq/2-4-5.html

41. U. Maurer, Towards the equivalence of breaking the Diffie-Hellman protocol and computing discrete logarithms, Advances in Cryptology - Crypto '94, Springer-Verlag (1994), 271-281.

42. Cracking DES, Secrets of Encryption Research, Wiretap Politics & Chip Design

By Electronic Frontier Foundation, 1st Edition July 1998

43. An implementation of the GSM A3A8 algorithm. (Specifically, COMP128.)

44. The Berkeley group COMP128 analysis, April 1998 , Ian Goldberg & Marc Briceno http://www.isaac.cs.berkeley.edu/isaac/gsm.html

45. Serpent and Smartcards, Ross Anderson Eli Biham Lars Knudsen

46. Serge Vaudenay's ``FFT-Hash II is not yet secure''

Further Info,

Shepherd S. J., Cryptanalysis of the GSM A5 Cipher Algorithm, IEE Colloquium on Security and Cryptography Applications to Radio Systems, Digest No. 1994/141, Savoy Place, London, 3 June 1994

Shepherd S. J., An Approach to the Cryptanalysis of Mobile Stream Ciphers, IEE Colloquium on Security and Cryptography Applications to Radio Systems, Digest No. 1994/141, Savoy Place, London, 3 June 1994

I assume both of the documents mentioned above would reveal a lot about the A5 stream cipher and its weaknesses. Unfortunately, Mr Shepherd was convinced not to give either of the above talks and days afterwards both document where classified my the English Government, Mr Shepherd cannot comment on this.

Glossary

Plaintext: message string to be encrypted.

Ciphertext: encrypted message

Encryption method: algorithm to convert from plaintext to ciphertext.

Encryption Key: a string used in by the encryption method to transform plaintext to ciphertext.

Decryption method: algorithm to convert from ciphertext back to original plaintext.

Decryption Key: a string used in by the decryption method to convert ciphertext back to original plaintext.

Cryptography: the art of coming up with ciphers (methods, keys, etc).

Cryptanalysis: the art of breaking ciphers.

Birthday Paradox: the apparent paradox that, in a schoolroom of only 23 students, there is a 50% probability that at least two will have the same birthday. The "paradox" is resolved by noting that we have a 1/365 chance of success for each possible pairing of students, and there are 253 possible pairs or combinations of 23 things taken 2 at a time. The overall probability of success is (1 –1/365) 0.99726 multiplied by itself for each pair, which equals 0.4995. So the success probability for (23 * 22 / 2) = 253 pairs is 0.5005 i.e. 50% chance.

Whitening The very first and last cryptographic operations are some form of mixing of subkeys with the data block in most block ciphers. Whenever this subkey mixing does not naturally occur as the initial step of the first round or the final step of the last round, the subkey mixing as an extra step is called pre- or post-whitening. _[27]

VHDL Stands for VHSIC Hardware Description Language. VHSIC is yet another achronym which stands for Very High Speed Integrated Circuits.

Acknowledgements

I would like to thank the following;

Tom Coffey my supervisor.

Aileen McCarthy for telling me the best ways to corner my supervisor.

Hugh McGuirk for answering all my stupit questions.

Trevor O’Connor, Donal Ward, Noel Guinane, Aongus McGreel, Tomas Winston, James Forde and Tracey Flannagan.

Appendix A - Hardware Performance Figures for the AES Finalists.

The following is a reproduction of some of the more telling results for the Hardware Performance Simulations of Round 2 Advanced Encryption Standard Algorithms by the National Security Agency _[29].Please see Chapter three for more details;

Figure 50 Mars 3-in-1 result summary

Figure 51 Mars 128Bit result summary

Figure 52 RC6 3-in-1 result summary

Figure 53 RC6 128Bit-in-1 result summary

Figure 54 Rijndael 3-in-1 result summary

Figure 55 Rijndael 128Bit result summary

Note; Both Key Setup and Key Agility times for Twofish are different.

Figure 56 Serpent 3-in-1 result summary

Within the implementation of the SERPENT algorithm, the key size is padded to 256 bits internally, so the effects of varying key sizes have negligible impact on the performance metrics of the design. Consequently, only the 3-in-1cases is provided.

Figure 57 Twofish 3-in-1 result summary

Figure 58 Twofish 128Bit result summary

Note; Both Key Setup and Key Agility times for Twofish are different.

Appendix B - A implemantation of A5 in C.

* A pedagogical implementation of A5/1.

* see www.scard.org for full version

#include <stdio.h>

/* Masks for the three shift registers */

#define R1MASK 0x07FFFF /* 19 bits, numbered 0..18 */

#define R2MASK 0x3FFFFF /* 22 bits, numbered 0..21 */

#define R3MASK 0x7FFFFF /* 23 bits, numbered 0..22 */

/* Middle bit of each of the three shift registers, for clock control */

#define R1MID 0x000100 /* bit 8 */

#define R2MID 0x000400 /* bit 10 */

#define R3MID 0x000400 /* bit 10 */

/* Feedback taps, for clocking the shift registers.

* These correspond to the primitive polynomials

* x^19 + x^5 + x^2 + x + 1, x^22 + x + 1,

* and x^23 + x^15 + x^2 + x + 1. */

#define R1TAPS 0x072000 /* bits 18,17,16,13 */

#define R2TAPS 0x300000 /* bits 21,20 */

#define R3TAPS 0x700080 /* bits 22,21,20,7 */

/* Output taps, for output generation */

#define R1OUT 0x040000 /* bit 18 (the high bit) */

#define R2OUT 0x200000 /* bit 21 (the high bit) */

#define R3OUT 0x400000 /* bit 22 (the high bit) */

typedef unsigned char byte;

typedef unsigned long word;

typedef word bit;

/* Calculate the parity of a 32-bit word, i.e. the sum of its bits modulo 2 */

bit parity(word x) {

x ^= x>>16;

x ^= x>>8;

x ^= x>>4;

x ^= x>>2;

x ^= x>>1;

return x&1;

}

/* Clock one shift register */

word clockone(word reg, word mask, word taps) {

word t = reg & taps;

reg = (reg << 1) & mask;

reg |= parity(t);

return reg;

}

/* The three shift registers. They're in global variables to make the code easier to *understand. A better implementation would not use global variables. */

word R1, R2, R3;

/* Look at the middle bits of R1,R2,R3, take a vote, and return the majority value of *those 3 bits. */

bit majority() {

int sum;

sum = parity(R1&R1MID) + parity(R2&R2MID) + parity(R3&R3MID);

if (sum >= 2)

return 1;

else

return 0;

}

/* Clock two or three of R1,R2,R3, with clock control according to their middle bits.

* Specifically, we clock Ri whenever Ri's middle bit agrees with the majority value of *the three middle bits.*/

void clock() {

bit maj = majority();

if (((R1&R1MID)!=0) == maj)

R1 = clockone(R1, R1MASK, R1TAPS);

if (((R2&R2MID)!=0) == maj)

R2 = clockone(R2, R2MASK, R2TAPS);

if (((R3&R3MID)!=0) == maj)

R3 = clockone(R3, R3MASK, R3TAPS);

}

/* Clock all three of R1,R2,R3, ignoring their middle bits. This is only used for key *setup. */

void clockallthree() {

R1 = clockone(R1, R1MASK, R1TAPS);

R2 = clockone(R2, R2MASK, R2TAPS);

R3 = clockone(R3, R3MASK, R3TAPS);

}

/* Generate an output bit from the current state. You grab a bit from each register via *the output generation taps; then you XOR the resulting three bits. */

bit getbit() {

return parity(R1&R1OUT)^parity(R2&R2OUT)^parity(R3&R3OUT);

}

/* Do the A5/1 key setup. This routine accepts a 64-bit key and a 22-bit frame *number. */

void keysetup(byte key[8], word frame) {

int i;

bit keybit, framebit;

/* Zero out the shift registers. */

R1 = R2 = R3 = 0;

/* Load the key into the shift registers, LSB of first byte of key array first, clocking *each register once for every key bit loaded. (The usual clock control rule is *temporarily disabled.) */

for (i=0; i<64; i++) {

clockallthree(); /* always clock */

keybit = (key[i/8] >> (i&7)) & 1; /* The i-th bit of the

key */

R1 ^= keybit; R2 ^= keybit; R3 ^= keybit;

}

/* Load the frame number into the shift registers, LSB first, clocking each register *once for every key bit loaded. (The usual clock control rule is still disabled.) */

for (i=0; i<22; i++) {

clockallthree(); /* always clock */

framebit = (frame >> i) & 1; /* The i-th bit of the frame #*/

R1 ^= framebit; R2 ^= framebit; R3 ^= framebit;

}

/* Run the shift registers for 100 clocks to mix the keying material and frame number

* together with output generation disabled, so that there is sufficient avalanche.

*We re-enable the majority-based clock control rule from now on. */

for (i=0; i<100; i++) {

clock();

}

/* Now the key is properly set up. */

}

/* Generate output. We generate 228 bits of

* keystream output. The first 114 bits is for

* the A->B frame; the next 114 bits is for the

* B->A frame. You allocate a 15-byte buffer

* for each direction, and this function fills

* it in. */

void run(byte AtoBkeystream[], byte BtoAkeystream[]) {

int i;

/* Zero out the output buffers. */

for (i=0; i<=113/8; i++)

AtoBkeystream[i] = BtoAkeystream[i] = 0;

/* Generate 114 bits of keystream for the

* A->B direction. Store it, MSB first. */

for (i=0; i<114; i++) {

clock();

AtoBkeystream[i/8] |= getbit() << (7-(i&7));

}

/* Generate 114 bits of keystream for the

* B->A direction. Store it, MSB first. */

for (i=0; i<114; i++) {

clock();

BtoAkeystream[i/8] |= getbit() << (7-(i&7));}}

Appendix C - A implementation of COMP128 in C.

/* An implementation of the GSM A3A8 algorithm. (Specifically, COMP128.)

* see www.scard.org for full version

typedef unsigned char Byte;

#include <stdio.h>

/* #define TEST */

* rand[0..15]: the challenge from the base station

* key[0..15]: the SIM's A3/A8 long-term key Ki

* simoutput[0..11]: what you'd get back if you fed rand and key to a real

* SIM.

* The GSM spec states that simoutput[0..3] is SRES,

* and simoutput[4..11] is Kc (the A5 session key).

* (See GSM 11.11, Section 8.16. See also the leaked document

* referenced below.)

* Note that Kc is bits 74..127 of the COMP128 output, followed by 10

* zeros.

* In other words, A5 is keyed with only 54 bits of entropy. This

* represents a deliberate weakening of the key used for voice privacy

* by a factor of over 1000.

* Verified with a Pacific Bell Schlumberger SIM. Your mileage may vary.

* Marc Briceno <marc@scard.org>, Ian Goldberg <iang@cs.berkeley.edu>,

* and David Wagner <daw@cs.berkeley.edu>

void A3A8(/* in */ Byte rand[16], /* in */ Byte key[16],

/* out */ Byte simoutput[12]);

/* The compression tables. */

static const Byte table_0[512] = {

102,177,186,162, 2,156,112, 75, 55, 25, 8, 12,251,193,246,188,

109,213,151, 53, 42, 79,191,115,233,242,164,223,209,148,108,161,

252, 37,244, 47, 64,211, 6,237,185,160,139,113, 76,138, 59, 70,

67, 26, 13,157, 63,179,221, 30,214, 36,166, 69,152,124,207,116,

247,194, 41, 84, 71, 1, 49, 14, 95, 35,169, 21, 96, 78,215,225,

182,243, 28, 92,201,118, 4, 74,248,128, 17, 11,146,132,245, 48,

149, 90,120, 39, 87,230,106,232,175, 19,126,190,202,141,137,176,

250, 27,101, 40,219,227, 58, 20, 51,178, 98,216,140, 22, 32,121,

61,103,203, 72, 29,110, 85,212,180,204,150,183, 15, 66,172,196,

56,197,158, 0,100, 45,153, 7,144,222,163,167, 60,135,210,231,

174,165, 38,249,224, 34,220,229,217,208,241, 68,206,189,125,255,

239, 54,168, 89,123,122, 73,145,117,234,143, 99,129,200,192, 82,

104,170,136,235, 93, 81,205,173,236, 94,105, 52, 46,228,198, 5,

57,254, 97,155,142,133,199,171,187, 50, 65,181,127,107,147,226,

184,218,131, 33, 77, 86, 31, 44, 88, 62,238, 18, 24, 43,154, 23,

80,159,134,111, 9,114, 3, 91, 16,130, 83, 10,195,240,253,119,

177,102,162,186,156, 2, 75,112, 25, 55, 12, 8,193,251,188,246,

213,109, 53,151, 79, 42,115,191,242,233,223,164,148,209,161,108,

37,252, 47,244,211, 64,237, 6,160,185,113,139,138, 76, 70, 59,

26, 67,157, 13,179, 63, 30,221, 36,214, 69,166,124,152,116,207,

194,247, 84, 41, 1, 71, 14, 49, 35, 95, 21,169, 78, 96,225,215,

243,182, 92, 28,118,201, 74, 4,128,248, 11, 17,132,146, 48,245,

90,149, 39,120,230, 87,232,106, 19,175,190,126,141,202,176,137,

27,250, 40,101,227,219, 20, 58,178, 51,216, 98, 22,140,121, 32,

103, 61, 72,203,110, 29,212, 85,204,180,183,150, 66, 15,196,172,

197, 56, 0,158, 45,100, 7,153,222,144,167,163,135, 60,231,210,

165,174,249, 38, 34,224,229,220,208,217, 68,241,189,206,255,125,

54,239, 89,168,122,123,145, 73,234,117, 99,143,200,129, 82,192,

170,104,235,136, 81, 93,173,205, 94,236, 52,105,228, 46, 5,198,

254, 57,155, 97,133,142,171,199, 50,187,181, 65,107,127,226,147,

218,184, 33,131, 86, 77, 44, 31, 62, 88, 18,238, 43, 24, 23,154,

159, 80,111,134,114, 9, 91, 3,130, 16, 10, 83,240,195,119,253

table_1[256] = {

19, 11, 80,114, 43, 1, 69, 94, 39, 18,127,117, 97, 3, 85, 43,

27,124, 70, 83, 47, 71, 63, 10, 47, 89, 79, 4, 14, 59, 11, 5,

35,107,103, 68, 21, 86, 36, 91, 85,126, 32, 50,109, 94,120, 6,

53, 79, 28, 45, 99, 95, 41, 34, 88, 68, 93, 55,110,125,105, 20,

90, 80, 76, 96, 23, 60, 89, 64,121, 56, 14, 74,101, 8, 19, 78,

76, 66,104, 46,111, 50, 32, 3, 39, 0, 58, 25, 92, 22, 18, 51,

57, 65,119,116, 22,109, 7, 86, 59, 93, 62,110, 78, 99, 77, 67,

12,113, 87, 98,102, 5, 88, 33, 38, 56, 23, 8, 75, 45, 13, 75,

95, 63, 28, 49,123,120, 20,112, 44, 30, 15, 98,106, 2,103, 29,

82,107, 42,124, 24, 30, 41, 16,108,100,117, 40, 73, 40, 7,114,

82,115, 36,112, 12,102,100, 84, 92, 48, 72, 97, 9, 54, 55, 74,

113,123, 17, 26, 53, 58, 4, 9, 69,122, 21,118, 42, 60, 27, 73,

118,125, 34, 15, 65,115, 84, 64, 62, 81, 70, 1, 24,111,121, 83,

104, 81, 49,127, 48,105, 31, 10, 6, 91, 87, 37, 16, 54,116,126,

31, 38, 13, 0, 72,106, 77, 61, 26, 67, 46, 29, 96, 37, 61, 52,

101, 17, 44,108, 71, 52, 66, 57, 33, 51, 25, 90, 2,119,122, 35

}, table_2[128] = {

52, 50, 44, 6, 21, 49, 41, 59, 39, 51, 25, 32, 51, 47, 52, 43,

37, 4, 40, 34, 61, 12, 28, 4, 58, 23, 8, 15, 12, 22, 9, 18,

55, 10, 33, 35, 50, 1, 43, 3, 57, 13, 62, 14, 7, 42, 44, 59,

62, 57, 27, 6, 8, 31, 26, 54, 41, 22, 45, 20, 39, 3, 16, 56,

48, 2, 21, 28, 36, 42, 60, 33, 34, 18, 0, 11, 24, 10, 17, 61,

29, 14, 45, 26, 55, 46, 11, 17, 54, 46, 9, 24, 30, 60, 32, 0,

20, 38, 2, 30, 58, 35, 1, 16, 56, 40, 23, 48, 13, 19, 19, 27,

31, 53, 47, 38, 63, 15, 49, 5, 37, 53, 25, 36, 63, 29, 5, 7

}, table_3[64] = {

1, 5, 29, 6, 25, 1, 18, 23, 17, 19, 0, 9, 24, 25, 6, 31,

28, 20, 24, 30, 4, 27, 3, 13, 15, 16, 14, 18, 4, 3, 8, 9,

20, 0, 12, 26, 21, 8, 28, 2, 29, 2, 15, 7, 11, 22, 14, 10,

17, 21, 12, 30, 26, 27, 16, 31, 11, 7, 13, 23, 10, 5, 22, 19

}, table_4[32] = {

15, 12, 10, 4, 1, 14, 11, 7, 5, 0, 14, 7, 1, 2, 13, 8,

10, 3, 4, 9, 6, 0, 3, 2, 5, 6, 8, 9, 11, 13, 15, 12

}, *table[5] = { table_0, table_1, table_2, table_3, table_4 };

void A3A8(/* in */ Byte rand[16], /* in */ Byte key[16],

/* out */ Byte simoutput[12])

{

Byte x[32], bit[128];

int i, j, k, l, m, n, y, z, next_bit;

/* ( Load RAND into last 16 bytes of input ) */

for (i=16; i<32; i++)

x[i] = rand[i-16];

/* ( Loop eight times ) */

for (i=1; i<9; i++) {

/* ( Load key into first 16 bytes of input ) */

for (j=0; j<16; j++)

x[j] = key[j];

/* ( Perform substitutions ) */

for (j=0; j<5; j++)

for (k=0; k<(1<<j); k++)

for (l=0; l<(1<<(4-j)); l++) {

m = l + k*(1<<(5-j));

n = m + (1<<(4-j));

y = (x[m]+2*x[n]) % (1<<(9-j));

z = (2*x[m]+x[n]) % (1<<(9-j));

x[m] = table[j][y];

x[n] = table[j][z];

}

/* ( Form bits to bytes ) */

for (j=0; j<32; j++)

for (k=0; k<4; k++)

bit[4*j+k] = (x[j]>>(3-k)) & 1;

/* ( Permutation but not on the last loop ) */

if (i < 8)

for (j=0; j<16; j++) {

x[j+16] = 0;

for (k=0; k<8; k++) {

next_bit = ((8*j + k)*17) % 128;

x[j+16] |= bit[next_bit] << (7-k);

}

/ * ( At this stage the vector x[] consists of 32 nibbles.

* The first 8 of these are taken as the output SRES. */

/* The remainder of the code is not given explicitly in the

* standard, but was derived by reverse-engineering.*/

for (i=0; i<4; i++)

simoutput[i] = (x[2*i]<<4) | x[2*i+1];

for (i=0; i<6; i++)

simoutput[4+i] = (x[2*i+18]<<6) | (x[2*i+18+1]<<2)

| (x[2*i+18+2]>>2);

simoutput[4+6] = (x[2*6+18]<<6) | (x[2*6+18+1]<<2);

simoutput[4+7] = 0;

}

Appendix D – A Verilog implementation of A5

A5/1 Circuit, expressed in Verilog. The following is equivalent to the purported A5/1

pedagogical C in Appendix A.

a5.v

Purported A5/1 circuit, expressed in Verilog.

13 May 99

David Honig

honig@sprynet.com

Derived from Briceno, Goldberg, Wagner's Pedagogical C Code

of May 99.

To load key: assert Startloading, load data starting on next clock,

bitwise (1 delay + 64 key + 22 frame clocks).

Then wait for Doneloading to be asserted (100 more clocks). Then

harvest your bits.

A testbench and sample output is appended as comments.

This synthesizes to about 150 LCs and runs at 80 Mhz on

the smaller Altera CPLDs e.g., 10K30.

module a5(Clk, Reset_n,

Bitout,

Keybit,

Startloading,

Doneloading);

input Clk, Reset_n;

output Bitout; // output keystream

reg Bitout;

input Keybit; // input keybits 64 + 22

input Startloading; // initial keyload

output Doneloading; // signal done of keyloading

reg Doneloading;

// internal state; lsb is leftmost

reg [18:0] lfsr_1;

reg [21:0] lfsr_2;

reg [22:0] lfsr_3;

reg [1:0] State; // FSM control

reg [6:0] Counter; // for counting steps

reg [2:0] Phase; // for sequencing phases

wire hi_1, hi_2, hi_3;

assign hi_1 = lfsr_1[18];

assign hi_2 = lfsr_2[21];

assign hi_3 = lfsr_3[22];

wire mid1, mid2, mid3;

assign mid1=lfsr_1[8];

assign mid2=lfsr_2[10];

assign mid3=lfsr_3[10];

wire maj;

assign maj=majority(mid1, mid2, mid3);

wire newbit1, newbit2, newbit3;

assign newbit1= ( lfsr_1[13] ^ lfsr_1[16] ^ lfsr_1[17] ^ lfsr_1[18] );

assign newbit2= ( lfsr_2[20] ^ lfsr_2[21] ) ;

assign newbit3= ( lfsr_3[7] ^ lfsr_3[20] ^ lfsr_3[21] ^ lfsr_3[22] );

parameter IDLE=0;

parameter KEYING=1;

parameter RUNNING=2;

always @(posedge Clk or negedge Reset_n) begin

if (!Reset_n) begin: resetting

$display("a5 reset");

Doneloading <=0;

Bitout <=0;

{lfsr_1, lfsr_2, lfsr_3} <= 64'h 0;

{State, Counter, Phase} <=0;

end // reset

else begin

case (State)

IDLE: begin: reset_but_no_key

if (Startloading) begin: startloadingkey

// $display("Loading key starts at %0d ", $time);

State <= KEYING;

{lfsr_1, lfsr_2, lfsr_3} <= 64'h 0;

Phase <=0; Counter<=0;

end // if

end // idle

KEYING: begin

case (Phase)

0: begin: load64andclock

clockallwithkey;

// $display("Loading key bit %0b %0d at %0d %0x", Keybit, Counter, $time, lfsr_1);

if (Counter==63) begin

Counter <=0;

Phase <= Phase +1;

$display(" ");

end

else Counter <=Counter+1;

end

1: begin: load22andclock

// $display("Loading frame bit %0b at %0d %0d %0x", Keybit, Counter, $time, lfsr_1);

clockallwithkey;

if (Counter==21) begin

Counter <=0;

Phase <= Phase +1;

end

else Counter <=Counter+1;

end

2: begin: clock100

majclock;

if (Counter ==100) begin

$display("Done keying, now running %0d\n", $time);

State <= RUNNING;

end

else Counter <= Counter+1;

end

endcase // Phase

end // keying

RUNNING: begin

Doneloading <=1; // when done loading

Bitout <= hi_1 ^ hi_2 ^ hi_3;

majclock;

end // running

endcase // State

end // else not resetting

end // always

function majority;

input a,b,c;

begin

case({a,b,c}) // synopsys parallel_case

3'b 000: majority=0;

3'b 001: majority=0;

3'b 010: majority=0;

3'b 011: majority=1;

3'b 100: majority=0;

3'b 101: majority=1;

3'b 110: majority=1;

3'b 111: majority=1;

endcase

end

endfunction

task clock1;

begin

lfsr_1 <= ( lfsr_1 << 1 ) | newbit1;

end

endtask

task clock2;

begin

lfsr_2 <= (lfsr_2 << 1) | newbit2;

end

endtask

task clock3;

begin

lfsr_3 <= (lfsr_3 << 1) | newbit3;

end

endtask

task clockall;

begin

clock1;

clock2;

clock3;

end

endtask

task clockallwithkey;

begin

lfsr_1 <= ( lfsr_1 << 1 ) | newbit1 ^ Keybit;

lfsr_2 <= ( lfsr_2 << 1 ) | newbit2 ^ Keybit;

lfsr_3 <= ( lfsr_3 << 1 ) | newbit3 ^ Keybit;

end

endtask

task majclock;

begin

if (mid1 == maj) clock1;

if (mid2 == maj) clock2;

if (mid3 == maj) clock3;

end

endtask

endmodule

/**************** CUT HERE FOR TESTBENCH test_a5.v **************************

module test_a5;

reg Clk, Reset_n;

wire Bitout; // output keystream

reg Keybit; // input keybits 64 + 22

reg Startloading; // initial keyload

wire Doneloading; // signal done of keyloading

reg [0:7] key [7:0];

reg [22:0] frame;

a5 dut(Clk, Reset_n,

Bitout,

Keybit,

Startloading,

Doneloading);

always @(Clk) #5 Clk <= ~Clk;

integer i,j;

initial begin

key[0]= 8'h 12;

key[1]= 8'h 23;

key[2]= 8'h 45;

key[3]= 8'h 67;

key[4]= 8'h 89;

key[5]= 8'h AB;

key[6]= 8'h CD;

key[7]= 8'h EF;

frame <= 22'h 134;

Clk <=0;

Reset_n <=1;

Startloading <=0;

Keybit <=0;

#10 Reset_n <=0;

#10 Reset_n <=1;

// key setup

#100

Startloading <=1; $display("Starting to key %0d", $time);

for (i=0; i<8; i=i+1) begin

for (j=0; j<8; j=j+1) begin

#10 Startloading <=0;

Keybit <= key[i] >> j;

end // j

end // i

for (i=0; i<22; i=i+1) begin

#10 Keybit <= frame[i];

end

wait(Doneloading); $display("Done keying %0d", $time);

$write("\nBits out: \n");

repeat (32) #10 $write("%b", Bitout);

$display("\nknown good=\n%b", 32'h 534EAA58);

#1000 $display("\nSim done."); $finish;

end // init

endmodule

Appendix E – C code for Brute force attack on A5

/* Brute force attack on A5, by Eoin Ward eoinward@yahoo.com

*A5 implementation from Briceno, Goldberg, Wagner's Pedagogical C Code

*Optimizations

*Generates one byte first and compares before generating rest of output.

*only generates 64 bits of output instead of the full 228 bits.

#include <stdio.h>

#include <stdlib.h>

#include <conio.h>

#include <time.h>

/* Masks for the three shift registers */

#define R1MASK 0x07FFFF /* 19 bits, numbered 0..18 are all set to 1 */

#define R2MASK 0x3FFFFF /* 22 bits, numbered 0..21 are all set to 1*/

#define R3MASK 0x7FFFFF /* 23 bits, numbered 0..22 are all set to 1*/

/* Middle bit of each of the three shift registers, for clock control */

#define R1MID 0x000100 /* bit 8 --the 8th bit is 1*/

#define R2MID 0x000400 /* bit 10 --the 10th bit is 1*/

#define R3MID 0x000400 /* bit 10 --the 10th bit is 1*/

/* Feedback taps, for clocking the shift registers. These correspond to the primitive

* polynomials x^19 + x^5 + x^2 + x + 1, x^22 + x + 1,

* and x^23 + x^15 + x^2 + x + 1. */

#define R1TAPS 0x072000 /* bits 18,17,16,13 --are one */

#define R2TAPS 0x300000 /* bits 21,20 --are one */

#define R3TAPS 0x700080 /* bits 22,21,20,7 --are one*/

/* Output taps, for output generation */

#define R1OUT 0x040000 /* bit 18 (the high bit) */

#define R2OUT 0x200000 /* bit 21 (the high bit) */

#define R3OUT 0x400000 /* bit 22 (the high bit) */

typedef unsigned char byte;

typedef unsigned long word;

typedef word bit;

/*prototypes missing from orignail code causing warnings*/

bit majority(void);

void clockallthree(void);

void Clock(void);

bit getbit(void);

void test(void);

bit parity(word x);

word clockone(word reg, word mask, word taps);

/* Calculate the parity of a 32-bit word, i.e. the sum of its bits modulo 2 */

bit parity(word x) {

x ^= x>>16; /* ^= Assign bitwise XOR >> shift right */

x ^= x>>8;

x ^= x>>4;

x ^= x>>2;

x ^= x>>1;

return x&1;}

/* Clock one shift register */

word clockone(word reg, word mask, word taps) {

word t = reg & taps;

reg = (reg << 1) & mask;

reg |= parity(t);

return reg;}

/* The three shift registers. */

word R1, R2, R3;

/* Look at the middle bits of R1,R2,R3, take a vote, and

* return the majority value of those 3 bits. */

bit majority() {

int sum;

sum = parity(R1&R1MID) + parity(R2&R2MID) + parity(R3&R3MID);

if (sum >= 2) /* the sum of the 8th bit of */

return 1; /* R1 the 10th bit of r2 and the 10th bit of R3*/

else

return 0;

}

/* Clock two or three of R1,R2,R3, according to their middle bits.*/

void Clock() {

bit maj = majority();

if (((R1&R1MID)!=0) == (int)maj)

R1 = clockone(R1, R1MASK, R1TAPS);

if (((R2&R2MID)!=0) == (int)maj)

R2 = clockone(R2, R2MASK, R2TAPS);

if (((R3&R3MID)!=0) == (int)maj)

R3 = clockone(R3, R3MASK, R3TAPS);

}

/* Clock all three of R1,R2,R3, ignoring their middle bits.

* This is only used for key setup. */

void clockallthree() {

R1 = clockone(R1, R1MASK, R1TAPS);

R2 = clockone(R2, R2MASK, R2TAPS);

R3 = clockone(R3, R3MASK, R3TAPS);

}

/* Generate an output bit from the current state.

* You grab a bit from each register via the output generation taps;

* then you XOR the resulting three bits. */

bit getbit() {

return parity(R1&R1OUT)^parity(R2&R2OUT)^parity(R3&R3OUT);

}

/* Do the A5/1 key setup. This routine accepts a 64-bit key and

* a 22-bit frame number. */

void keysetup(byte key[8], word frame) {

int i;

bit keybit, framebit;

/* Zero out the shift registers. */

R1 = R2 = R3 = 0;

/* Load the key into the shift registers,

* LSB of first byte of key array first,

* clocking each register once for every

* key bit loaded. (The usual clock control rule is temporarily disabled.) */

for (i=0; i<64; i++) {

clockallthree(); /* always clock */

keybit = (key[i/8] >> (i&7)) & 1; /* The i-th bit of the key */

R1 ^= keybit; R2 ^= keybit; R3 ^= keybit;

}

/* Load the frame number into the shift registers, LSB first,

* clocking each register once for every key bit loaded. (The usual clock

* control rule is still disabled.) */

for (i=0; i<22; i++) {

clockallthree(); /* always clock */

framebit = (frame >> i) & 1; /* The i-th bit of the frame #*/

R1 ^= framebit; R2 ^= framebit; R3 ^= framebit;

}

/* Run the shift registers for 100 clocks to mix the keying material and frame *number together with output generation disabled, so that there is sufficient *avalanche. We re-enable the majority-based clock control rule from now on. */

for (i=0; i<100; i++) {

Clock();

}

/* Now the key is properly set up. */

}

/* This generates the first byte in the AtoB stream to for comparision, if it’s the same *the rest of the bits will be generated*/

void runfirstbyte(byte AtoBkeystream[])

{

int i;

for (i=0; i<=63/8; i++) /*zero register first 14 bytes*/

AtoBkeystream[i] = 0;

for (i=0; i<8; i++) {

Clock();

AtoBkeystream[i/8] |= (byte)(getbit() << (7-(i&7)));

}

/*only runs if first byte compares as the same as the know good output

* This only produces the remaining 56 bits not the full 228 bits. */

void run(byte AtoBkeystream[]) {

int i;

for (i=8; i<64; i++) {

Clock();

AtoBkeystream[i/8] |= (byte)(getbit() << (7-(i&7)));

}

/*Code that implements the attack. */

void test() {

word frame = 0x134; /* known frame number*/

byte key[8] = {0x00, 0x00, 0x40, 0x66, 0x88, 0xAA, 0xCC, 0xEE};

byte goodAtoB[8] = { 0x53, 0x4E, 0xAA, 0x58, 0x2F, 0xE8, 0x15,

0x1A}; /* first 64bits of the know good output */

byte AtoB[8];

int i,b1,b2,b3,b4,b5,b6,b7,b8,count=0;

/* big loop to increment each bit in the key*/

for (b8=0;b8<256;b8++)

{

key[7]= (byte)(key[7] + 1);

for (b7=0;b7<256;b7++)

{

key[6]= (byte)(key[6] + 1);

for (b6=0;b6<256;b6++)

{

key[5]= (byte)(key[5] + 1);

for (b5=0;b5<256;b5++)

{

key[4]= (byte)(key[4] + 1);

for (b4=0;b4<256;b4++)

{

key[3]= (byte)(key[3] + 1);

for (b3=0;b3<256;b3++)

{

key[2]= (byte)(key[2] + 1);

printf(" %d tests so far\n" ,(65536*count));

count=count+1; /*prints number of keys tested*/

for( b2=0;b2<256;b2++)

{

key[1]= (byte)(key[1] + 1);

for( b1=0;b1<256;b1++)

{

int failed=0;

key[0] = (byte)(key[0] + 1);

keysetup(key, frame);

runfirstbyte(AtoB); /*generate first byte and test*/

if (AtoB[0] != goodAtoB[0])

failed = 1;

if(failed){}else{

run(AtoB); /* if same generate the rest*/

for (i=1; i<8; i++)

if (AtoB[i] != goodAtoB[i])

failed = 1;

if (failed) {} else{ /*if = 1 try next key else print out key*/

printf("key found.\n");

printf("key: 0x");

for (i=0; i<8; i++)

printf("%02X", key[i]);

printf("\n frame number: 0x%06X\n", (unsigned int)frame);

printf("known good output(64bits only): \n");

printf(" A->B: 0x");

for (i=0; i<4; i++)

printf("%02X", goodAtoB[i]);

printf("\n observed output:\n");

printf(" A->B: 0x");

for (i=0; i<4; i++)

printf("%02X", AtoB[i]);

printf("\n");

return;

}}

}

int main(void) {

long starttime;

long endtime;

time( (time_t*) &starttime ); /*sets a start time*/

test(); /*Runs Attack*/

time( (time_t*) &endtime);

printf("Elapsed: %d", (int)endtime - starttime);

/* output the time taken for the attack in seconds very little over head as it sets the sarttime at the beign and just takes it away form the end time no counting involved*/

getch();

return 0;

}

Round No.	Number of circular shifts
1,2,9,16	One
3,4,5,6,7,8,10,11,12,13,14,15	Two