Perfect secrecy, or not?

Cryptography is the science of designing systems that can withstand malicious attempts to abuse them. Every cryptographic scenario can be illustrated by the story of security’s inseparable couple, Alice and Bob: Alice and Bob want to send messages to each other, while deterring various unwanted interlocutors, eavesdroppers, and tamperers from participating.

In the simplest model, Alice sends Bob a secret message, and Eve the eavesdropper attempts to decode Alice’s message. Alice’s goal is to encrypt the message in such a way that Bob can decrypt but Eve cannot. The formal study of such communication protocols must begin with the construction of a cryptographic model, which we can then analyze to determine its functionality and security.

The formalization of the Alice–Eve–Bob secnario consists of the following:

• A set $\mathcal{M}$ of messages, and another set $\mathcal{C}$ of ciphertexts;
• Alice has an encryption algorithm $\operatorname{Enc}(m)$ that takes $m \in \mathcal{M}$ and outputs $c \in \mathcal{C}$;
• Bob has a decryption algorithm $\operatorname{Dec}(c)$ that takes $c \in \mathcal{C}$ and outputs $m \in \mathcal{M}$;
• Eve has her own decryption algorithm $\mathcal{A}(c)$ that takes $c \in \mathcal{C}$ and outputs $m \in \mathcal{M}$.

In order for the above model to be functional, we expect to have $\operatorname{Dec}(\operatorname{Enc}(m)) = m$ for all $m \in \mathcal{M}$. Security would mean that $\mathcal{A}(\operatorname{Enc}(m)) \neq m$ for all $m \in \mathcal{M}$, or, at least, for a large portion of $m \in \mathcal{M}$.

This naïve definition of functionality and security quickly turns out to be hopeless. History has shown that one party can often find out which encryption method the other party is using through espionage and other tactics. For Alice to guarantee the security of her communication with Bob, it is imperative for her to use an encryption method that produces messages that cannot be decrypted even when the encryption algorithm is leaked to Eve.

To this end, we construct an encryption method with a secret key, without which encryption messages cannot be decrypted with ease. We update the above cryptographic model accordingly:

• We now have a set $\mathcal{M}$ of messages, a set $\mathcal{C}$ of ciphertexts, and a set $\mathcal{K}$ of keys.
• There is a key-generation algorithm $\operatorname{Gen}$ that outputs an element of $\mathcal{K}$.
• Alice is equipped with an encryption algorithm $\operatorname{Enc}(k,m) = \operatorname{Enc}_k(m)$ that takes $(k,m) \in \mathcal{K} \times \mathcal{M}$ and outputs $c \in \mathcal{C}$.
• Bob is equipped with a decryption algorithm $\operatorname{Dec}(k,c) = \operatorname{Dec}_k(c) = m$ that takes $(k,c) \in \mathcal{K} \times \mathcal{C}$ and outputs $m \in \mathcal{M}$.
• Even is equipped with another decryption algorithm $\mathcal{A}(k,c)$ that takes $(k,c) \in \mathcal{K} \times \mathcal{C}$ and outputs $m \in \mathcal{M}$.

In this context, it is reasonable to say that the model is functional if

for all $(k,m) \in \mathcal{K} \times \mathcal{M}$. In other words, Bob should be able to decrypt any message that Alice encrypted, so long as both of them use the same key.

Is this new model secure? We should hope that Eve cannot, in general, guess the key that Alice and Bob choose to use. Therefore, it is reasonable to assume that $\operatorname{Gen}$ is a randomized algorithm.

Moreover, if Eve fails to obtain the correct key, then she should not be able to recover the original message. For this, a typical ciphertext must not carry any inherent meaning on its own, thereby deterring Eve from deciphering it without a key.

We formalize these observations as follows:

Definition 1 (Shannon, 1949). A cryptographic system $(\operatorname{Gen}, \operatorname{Enc}, \operatorname{Dec})$ is perfectly secret if, for each probability distribution $\mathcal{D}$ over $\mathcal{M}$, we have the identity

for all choices of $(\bar{m},\bar{c}) \in \mathcal{M} \times \mathcal{C}$. In other words, the probability of recovering a fixed message $\bar{m}$ does not depend on the choice of the ciphertext $\bar{c}$.

An implementation of Shannon’s perfect secrecy model is the one-time pad algorithm. To state the algorithm, we recall from Section 1.10 that $\mathcal{F}_2 = \{0,1\}$ denotes the finite field of size 2.

Theorem 2 (Shannon one-time pad, 1949). Fix $n \in \mathbb{N}$ and let $\mathcal{M} = \mathcal{C} = \mathcal{K} = \mathbf{F}_2^n$, the $n$-fold cartesian product of $\mathbf{F}_2$. We define $\operatorname{Gen}$ to be the algorithm that chooese $k$ uniformly from $\mathcal{K}$. Let $\oplus$ denote the coordinatewise addition on $\mathbf{F}_2^n$

and define

for all $(m,c,k) \in \mathcal{M} \times \mathcal{C} \times \mathcal{K}$. The resulting system $(\operatorname{Gen}, \operatorname{Enc}, \operatorname{Dec})$, called the one-time pad, is a perfectly secret cryptographic system.

The one-time pad algorithm (OTP) is one of the earliest known examples of encryption methods with a secret key. Miller was the first person to describe OTP formally (Miller 1882). Shannon, on the other hand, was the first to prove formally the security of OTP (Shannon 1949.

OTP is an example of a symmetric-key encryption scheme, as the same key is used both for the encryption process and the decryption process. The above theorem shows that OTP is essentially unbreakable, but OTP is not without problems. We first note that a key $k$ must be just as large as the message $m$ is used. In fact, this condition cannot be altered:

Theorem 3 (Shannon, 1949). In every perfectly secret cryptographic system, $\vert \mathcal{K} \vert \geq \vert \mathcal{M} \vert$.

Worse, a key can never be recycled. Indeed, if $c_1 = m_1 \oplus k$ and $c_2 = m_2 \oplus k$, then

From $m_1 \oplus m_2$, the individual messages can be obtained with reasonable certainty. A historical example is the Venona project:

One-time pads used properly only once are unbreakable; however, the KGB’s cryptographic material manufacturing center in the Soviet Union apparently reused some of the pages from one-time pads. This provided Arlington Hall with an opening. Very few of the 1942 KGB messages could be solved because there was very little duplication of one-time pad pages in those messages. The situation was more favorable in 1943, even more so in 1944, and the success rate improved accordingly. (Benson 2011)

In short, it is hopeless to aim for perfect secrecy. We shall have to settle for something less while maintaining reasonable level of security, which is the underlying theme of cryptography.

We conclude the section with proofs of the above theorems. To this end, we shall make use of a technical lemma.

Lemma 4. The perfect secrecy condition holds if and only if

for all choices of $m_0,m_1 \in \mathcal{M}$ and $\bar{c} \in \mathcal{C}$. In other words, the probabilities of obtaining the same ciphertext $\bar{c}$ from two different messages $m_0$ and $m_1$ are the same.

We defer the proof of the lemma and establish the theorems.

For OTP,

for all choices of $(m,k) \in \mathcal{K}$, whence OTP is functional.

To check that OTP is ecure, we observe that, for an arbitrary choice of a probability distribution $\mathcal{D}$ over $\mathcal{M}$,

regardless of the choice of $(\bar{m},\bar{c}) \in \mathcal{M} \times \mathcal{C}$. Therefore,

for all choices of $m_0,m_1 \in \mathcal{M}$ and $\bar{c} \in \mathcal{C}$. Perfect secrecy of OTP now follows from the lemma.

To show that any perfectly secret cryptographic system must have keys at least as large as the message, we suppose that $% $. We fix $(m_0,k_0) \in \mathcal{M} \times \mathcal{K}$, set $c_0 = \operatorname{Enc}_{k_0}(m_0)$ and consider the set

Since

we can find $m_1 \in \mathcal{M} \smallsetminus \mathcal{N}$. Now,

but

and so it follows from the lemma that the cryptographic system in question is not perfectly secret.

Let us return to the proof of the technical lemma, which we restate below for convenience.

Lemma 4. The perfect secrecy condition holds if and only if

for all choices of $m_0,m_1 \in \mathcal{M}$ and $\bar{c} \in \mathcal{C}$. In other words, the probabilities of obtaining the same ciphertext $\bar{c}$ from two different messages $m_0$ and $m_1$ are the same.

For any choice of $(\bar{m},\bar{c}) \in \mathcal{M} \times \mathcal{C}$, we have that

Therefore,

if and only if

To this end, we observe that

and that

Since probabilities are always nonnegative,

if and only if

regardless of the choice of $m_0 \in \mathcal{M}$. This holds if and only if

and the proof of the lemma is now complete. $\square$

We have now seen in that perfect secrecy is exceedingly difficult to achieve. Nevertheless, it is merely sufficient to prevent Eve the eavesdropper from efficiently decrypting the message. To formalize this notion, we must make sense of what it means for Eve to be computationally bounded.

Computational boundedness is, in essence, the restriction on the ability to carry out computations that take too long to run. What, then is computation? We turn to Alan Turing, the father of theoretical computer science (Turing 1950):

The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer. The human computer is supposed to be following fixed rules; he has no authority to deviate from them in any detail. We may suppose that these rules are supplied in a book, which is altered whenever he is put on to a new job. He has also an unlimited supply of paper on which he does his calculations. He may also do his multiplications and additions on a “desk machine,” but this is not important.

If we use the above explanation as a definition we shall be in danger of circularity of argument. We avoid this by giving an outline of the means by which the desired effect is achieved. A digital computer can usually be regarded as consisting of three pats:

• Store
• Executive unit
• Control

The store is a store of information, and corresponds to the human computer’s paper, where this is the paper on which he does his calculations or that on which his book of rules is printed. In so far as the human computer does calculations in his bead a part of the store will correspond to his memory.

The executive unit is the part which carries out the various individual operations involved in a calculation. What these individual operations are will vary from machine to machine. Usually fairly lengthy operations can be done such as “Multiplicy 3540675445 by 7076345687” but in some machines only very simple ones such as “Write down 0” are possible.

We have mentioned that the “book of rules” supplied to the computer is replaced in the machine by a part of the store. It is then called the “table of instructions.” It is the duty of the control to see that these instructions are obeyed correctly and in the right order. The control is so constructed that this necessarily happens.

In short, computation can be modeled with a theoretical machine consisting of three parts:

• infinitely-long tapes with discrete cells, one of which containing input values and the rest providing read-write workspace areas,
• a state register that specifies the state of the machine at each step, and
• heads that can, in accordance with the state of the machine and the recorded symbols on tapes, read and write symbols on tapes, as well as moving tapes left or right one cell at a time.

On such a model, computational boundedness is merely a restriction on how many times the tapes can be moved before a computational task is considered too difficult. It is, then, worthwhile to formalize the model into a precise mathematical construct:

Definition 5. (Turing, 1950). For a fixed $k \in \mathbb{N}$, a $k$-tape Turing machine is defined to be an ordered triple $M = (\Gamma, Q, \delta)$ consisting of the following:

• A set $\Gamma$ of symbols, called the alphabet of $M$. Typically, we assume that $\Gamma$ contains, at least, 0, 1, blank, and start.
• A set $Q$ of states for $M$. Typically, we assume that $Q$ contains, at least, start and halt.
• A transition function

that, based on the current state of $M$ and the $k$ symbols at the current locations of the heads, produces output to be recorded on the $k-1$ workspace tapes and moving instructions for the $k$ heads.

We are, of course, interested in computational processes that terminate in a finite number of steps.

Definition 6. A $k$-tape Turing Machine $M = (\Gamma,Q,\delta)$ is said to be a deterministic algorithm, or simply an algorithm, if, for each input $x = (\texttt{start},\gamma_1,\ldots,\gamma_k) \in Q \times \Gamma^k$ that starts off $M$ at the start state, there exists a positive integer $n$ such that

produces the halt state. In other words, a deterministic algorithm always halts after following the instructions provided by the transition function finitely many times, regardless of the initial configuration of the input tapes.

The image to keep in mind is as follows: we put $k$ tapes into the Turing machine $M$ as the input, $M$ modifies the tapes until it hits the halt state, and the final configuration of the tapes is printed as the output.

We recall, however, that we are interested in computational efficiency, not mere computability. In order to formalize the notion of computational boundedness, we must work out what it means for an algorithm to have a certain running time.

It is typical to consider bit strings such as

as representations of data. Therefore, it makes sense to be able to refer to the collection of all finite bit strings:

Definition 7. Let $\{0,1\}^n$ denote the $n$-fold cartesian product of the bit set $\{0,1\}$. We define

the set of all finite bit strings. Given a bit string $x \in \{0,1\}^*$, we define its length $\vert x \vert$ to be the unique $n \in \mathbb{N}$ such that

With bit strings as our representation of data, it makes sense to think of a computational task as a function on $\{0,1\}^*$, i.e., a process that outputs a unique bit string for each input bit string. This turns out to be a sufficient abstraction for defining the notion of computational efficiency.

Definition 8. Let $f:\{0,1\}^* \to \{0,1\}^*$ and $T:\mathbb{N} \to \mathbb{N}$. We say that a Turing machine $M$ computes $f$ in $T$-time if, for every $x \in \{0,1\}^*$, the Turing machine $M$ initialized to the start configuration on input $x$ halts with $f(x)$ as its output within $T(\vert x \vert)$ steps.

In other words, we can use a function $T$ to provide an upper bound on the run time of a Turing machine $M$ computing $f$. Computational tasks in real life often takes longer with larger input data, and so it makes sense to have the $T$-time depends on the size $\vert x \vert$ of the input bit string.

In fact, it would make sense to have $T$ grow with its input size, at a rate sufficiently fast that the Turing machine $M$ is always given the time to read the input. Moreover, we would want $T$ itself to be efficiently computable as well, for otherwise we cannot make use of the information on computational boundedness with ease. We collect these desirable properties into the following definition:

Definition 9. A function $T:\mathbb{N} \to \mathbb{N}$ is time constructible if $T(n) \geq n$ for all $n \in \mathbb{N}$ and if there exists a Turing machine that computes the function $x \mapsto \operatorname{bin}(T(x))$, the binary representation of $T(x)$, in $T$-time.

Examples of time constructible functions include $n \log n$, $n^3$, and $2^n$.

Let us now define what it means for a function to be efficiently computable.

Definition 10. We define $\mathsf{poly}$ to be the set of all time-constructible functions $T:\mathbb{N} \to \mathbb{N}$ such that $T(n) = O(n^c)$ for some $c > 0$. A function $f:\{0,1\}^* \to \{0,1\}^*$ is said to be computable in polynomial time, or efficiently computable, if there exists a Turing machine $M$ and a function $T \in \mathsf{poly}$ such that $M$ computes $f$ in $T$-time.

We often write $\mathsf{poly}(n)$ to denote a fixed, unspecified element of $\mathsf{poly}$.

Examples of efficiently computable functions include the usual sorting algorithms, primality testing, Fast Fourier transform, and so on.

It is often useful to allow our model for computation to make use of randomness. For example, quicksort with a randomized pivot often performs a lot better than quicksort with a median-of-medians pivot, even though the latter has better worst-case runtime than the former. Some widely-used computational methods, such as the Monte Carlo methods, are always probabilistic in nature and do not have non-probabilistic analogues.

In light of this, we define a probabilistic analogue of the Turing machine.

Definition 11 (de Leeuw–Moore–Shannon–Shapiro, 1970). A probabilistic Turing machine is an ordered quadruple $M = (\Gamma,Q,\delta_1,\delta_2)$ consisting of a set $\Gamma$ of symbols, a set $Q$ of states, and two transition functions $\delta_1$ and $\delta_2$: see Definition 2.2.1. Given an input, probabilistic Turing machine $M$ is executed by randomly applying $\delta_1$ and $\delta_2$ with equal probability.

Probabilistic Turing machine $M$ is said to be a probabilistic algorithm if, for each input

there exists a positive integer $n$ such that

produces the halt state. Here, each $i_k$ is randomly chosen to be either 1 or 2.

The concept of computational efficiency for Turing machines can be carried over to the context of probabilistic Turing machines with minor modifications.

Definition 12. A probabilistic Turing machine $M$ computes $f:\{0,1\}^* \to \{0,1\}^*$ in $T$-time for some $T:\mathbb{N} \to \mathbb{N}$ if, for every choice of bit string $x \in \{0,1\}^*$, the probabilistic Turing machine $M$ initialized to the start state on input $x$ halts after at most $T(\vert x \vert)$ steps with $f(x)$ as the output, regardless of the random choices made within $M$.

Definition 13. A function $f:\{0,1\}^* \to \{0,1\}^*$ is said to be computable in probabilistic polynomial time if there exists a probabilistic Turing machine $M$ and a function $T \in \mathsf{poly}$ such that $M$ computes $f$ in $T$-time.

Regular Turing machines are sometimes called deterministic Turing machines to emphasize their difference with probabilistic Turing machines. Similarly, computability in polynomial time is often referred to as deterministic polynomial time.

With the language of computational complexity theory at hand, we can now formalize the notion of a process that is easy to carry out but difficult to revert. To this end, we introduce two preliminary definitions.

Definition 14. $U_n$ denotes a random variable distributed uniformly over $\{0,1\}^n$, i.e.,

whenever $\alpha \in \{0,1\}^n$ and equals zero otherwise.

Definition 15. $0^n$ refers to a bit string of length $n$, consisting entirely of 0. Similarly, $1^n$ refers to a bit string of length $n$, consisting entirely of 1.

We are now ready to give the definition of a one-way function.

Definition 16 (Diffie–Hellman, 1976). A function $f:\{0,1\}^* \to \{0,1\}^*$ is said to be one-way if the following conditions hold:

• $f$ is easy to compute, i.e., $f$ is computable in deterministic polynomial time.
• $f$ is difficult to invert, i.e., for each probabilistic polynomial-time algorithm $\mathcal{A}$ and every polynomial $p$,

for all sufficiently large $n$.

Why is the auxiliary input $1^n$ needed? Without it, a function can be considered one-way by merely shrinking its input: if the image is very small, an inverting algorithm simply would not have enough time with respect to the size of its input—i.e., the shrunk output of the original function—to have good computational complexity.

The existence of a one-way function has not been proven. In fact, an existence proof would settle the famous P versus NP problem. There are, however, plausible candidates for one-way functions, having withstood many attempts at producing efficient inverting algorithms.

The most famous example is the integer factorization problem, which is widely believed to be difficult. State-of-the-art factoring algorithms such as general number field sieve runs in subexponential time. In the language of one-way functions, the multiplication function

is conjectured to be a one-way function.

With this assumption, we can build the famous RSA cryptosystem, which builds on the difficulty of the integer factorization problem.

See Goldreich’s two-volume monograph for more information on the foundations of cryptography.

Tags:

Categories:

Updated: