Instructor: Prof. Jonathan Katz
Book: “Introduction to Modern Cryptography” by Katz and Lindell
ROUGH NOTES (!)
Updated: 5/3/24
Week-1:
Classical Cryptography focuses exclusively on ensuring secret communication between ${ 2 }$ parties sharing secret information in advance (That is, the focus is on private-key encryption schemes).
Private-key encryption schemes enable ${ 2 }$ parties who shared some secret information in advance to communicate over a public channel, while keeping the contents of their communication private from any adversarial eavesdropper.
Modern Cryptography has a broader scope, dealing with:
- Ensuring data privacy (as in Classical Cryptography)
- Ensuring data integrity (i.e. ensuring data is not modified improperly), user authentication
- Public-key encryption (where communicating users do not share any secret information in advance)
- Foundational topics like Number Theory and Information Theory
- Systems like electronic voting, distributed e-cash, etc.
Modern Cryptography deals with design, analysis, implementation of techniques for securing information, systems and computation (all very general terms) against adversarial attack.
Eg: (Lecture at 17:00) Consider a framework for a doctor to digitally access and change medical records.
Ideally: Doctor (digitally) asks for a patient Alice’s record. Hospital (digitally) sends Alice’s record ${ R _A .}$ Doctor asks to update Alice’s record to ${ S _A .}$ Hospital updates the record.
Some security concerns in this example are:
- [Authenticity and Integrity] From Doctor point of view: Was ${ R _A }$ really sent by the hospital? Even if so, was it modified en route? From Hospital point of view: Is the retrieve request really from Alice’s doctor? Is the store request really from Alice’s doctor? Even if so, was ${ S _A }$ modified en route?
- [Privacy] Eavesdroppers shouldn’t get Alice’s record.
Some uses of Cryptography:
- Passwords, password hashing (Secure login passwords from being eavesdropped)
- Secure credit-card transactions over the web
- Encrypted WiFi
- Disk encryption
- Digitally signed software updates
- Bitcoin
Threat modelling and proofs of security form the modern crypto mindset.
Two important cryptographic goals are secrecy and integrity.
In the private-key setting, secrecy is ensured by private-key encryption and integrity is ensured by message authentication codes.
In the public-key setting, secrecy is ensured by public-key encryption and integrity is ensured by digital signatures.
Building blocks:
- Pseudorandom number generators and Pseudorandom functions
- Hash functions
- Number theory
As mentioned, a private-key/symmetric-key encryption scheme relies on secret information (a key) shared between the communicating parties. Say we have Alice and Bob communicating in a private-key setting. It typically works like:
- Alice and Bob share a secret key ${ k }$ in advance.
- When Bob has a message/plaintext for Alice, he encrypts the message using the shared key (and an encryption algorithm). This generates a ciphertext which he sends to Alice over a public communication channel.
- Alice can then decrypt the ciphertext using the shared key (and a decryption algorithm) to recover the original message.
We aim for a security guarantee that no eavesdropper who observes the ciphertext sent can figure out anything about the underlying message.
Def: Formally, a private-key encryption scheme is defined by a message space ${ \mathcal{M} }$ (of messages supported by the scheme) along with algorithms ${ (\texttt{Gen, Enc, Dec}) }$ where:
- ${ \texttt{Gen} }$ (key generation algorithm): Randomized algorithm, generates ${ k }$
- ${ \texttt{Enc} }$ (encryption algorithm): Takes a key ${ k }$ and message ${ m \in \mathcal{M} }$ as input, outputs ciphertext ${ c .}$ We write ${ c \leftarrow \texttt{Enc} _k (m) }.$
Left arrow is used to denote assignment of output when an algorithm is randomized.
- ${ \texttt{Dec} }$ (decryption algorithm): Takes a key ${ k }$ and ciphertext ${ c }$ as input, outputs ${ m }$ or “error”. We write ${ m := \texttt{Dec} _k (c) .}$
We also have a correctness condition: For all ${ m \in \mathcal{M} }$ and all ${ k }$ output by ${ \texttt{Gen} ,}$ we must have \({ \texttt{Dec} _k (\texttt{Enc} _k (m)) = m .}\)
Eg: Consider the shift cipher (useless, just to illustrate the definition):
- Message space consists of strings of english letters (with punctuation removed and no distinction between upper and lower case). On equating the english alphabet with the set ${ \lbrace 0, 1, \ldots, 25 \rbrace }$ (so for eg ${ \texttt{a} = 0, \texttt{b} = 1, }$ etc.) the message space ${ \mathcal{M} }$ consists of all finite sequences of numbers from this set.
- Algorithm ${ \texttt{Gen} }$ outputs a uniform key ${ k \in \lbrace 0, 1, \ldots, 25 \rbrace .}$
- Encryption of a message ${ m = m _1 \ldots m _{\ell} }$ (each ${ m _i \in \lbrace 0, 1, \ldots, 25 \rbrace }$) using key ${ k }$ is \({ \texttt{Enc} _k (m) = c _1 \ldots c _{\ell} \text{ where } c _i = [(m _i + k) \, \text{mod } 26]. }\)
- Decryption of a ciphertext ${ c = c _1 \ldots c _{\ell} }$ using key ${ k }$ is \({ \texttt{Dec} _k (c) = m _1 \ldots m _{\ell} \text{ where } m _i = [(c _i - k) \, \text{mod } 26]. }\)
The shift cipher is not secure, there are only ${ 26 }$ possible keys. Given a ciphertext, an attacker (who knows everything about the scheme except the key) can simply try decrypting using every possible key (and of the ${ 26 }$ candidate plaintexts, typically one which “makes sense” is the actual plaintext).
Kerckhoff’s principle is that an encryption scheme used is not to be considered secret, but the only secret information is the key shared by the parties. Some reasons are:
- It is easier to maintain secrecy of a short key rather than a more complex algorithm.
- It is important that encryption schemes receive public scrutiny before widespread deployment. So encryption schemes should be public.
The key space ${ \mathcal{K} }$ of an encryption scheme is the set of all keys that can be output by the key generation algorithm. For an encryption scheme to be secure, it must have a key space large enough to prevent an exhaustive search attack like in the shift cipher (although a large key space alone doesn’t guarantee security).
Eg: Consider the Vigenère cipher. The key is now a string. To encrypt, put the key above the message (repeat the key as needed), and shift each character by the amount dictated by the adjacent character of the key. Decryption is the reverse process.
Key space of this cipher very large. For eg if keys are ${ 14 }$ character strings, then key space has size ${ 26 ^{14} \, (\geq (2 ^4) ^{14} = 2 ^{56}). }$ Exhaustive search attack is infeasible, but it is vulnerable to frequency analysis. (For simplicity, assume the attacker knows the key length is ${ 14 .}$ Now every ${ 14 }$th character in the ciphertext has been shifted by the same amount. Comparing letter frequencies in this substring and in generic english texts, we get candidates for the shifting character, namely ${ 0 }$th character of the key. One can continue so). This attack will only work given a long enough ciphertext. Also better attacks are known. In any case, the Vigenere cipher is not very secure despite having a large key space.
Modern cryptography differs from classical cryptography in that the emphasis is on threat models and proofs of security.
With a few exceptions, cryptography currently relies on some (unproved) complexity assumptions, on the impossibility of solving certain problems efficiently.
Some threat models in a private-key encryption scheme are (in increasing order of strength):
- Ciphertext-only attack (Here the attacker only gets to observe ciphertext(s) sent by the parties and nothing else)
- Known-plaintext attack (Here in addition to above the attacker is able to learn one or more plaintext-ciphertext pairs generated using the same key)
Attacker’s goal then is to deduce information about the underlying plaintext of some other ciphertext, produced using the same key.
- Chosen-plaintext attack (Here in addition to above the attacker can obtain plaintext-ciphertext pairs for plaintexts of its choice)
- Chosen-ciphertext attack (Here in addition to above the attacker can obtain decryptions of ciphertexts of its choice)
For now consider the simplest threat model, an eavesdropper making a ciphertext-only attack. How should we define security?
One informal definition is: An encryption scheme is secure if, regardless of any prior info the attacker has of the plaintext, the ciphertext can leak no additional info about the plaintext.
Consider an encryption scheme ${ (\texttt{Gen, Enc, Dec}),}$ with message space ${ \mathcal{M} .}$ Consider the randomized experiment:
- Choose a message ${ m \in \mathcal{M} }$ according to some distribution
- Generate a key ${ k }$ using ${ \texttt{Gen} }$
- Compute ${ c \leftarrow \texttt{Enc} _k (m) .}$
This defines a distribution on the ciphertext. Let ${ C }$ be the random variable denoting the value of the ciphertext in this experiment.
Def: Encryption scheme ${ (\texttt{Gen, Enc, Dec}) }$ with message space ${ \mathcal{M} }$ and ciphertext space ${ \mathcal{C} }$ is perfectly secret if for every prior distribution over ${ \mathcal{M} ,}$ every ${ m \in \mathcal{M}, }$ and every ${ c \in \mathcal{C} }$ with ${ \mathbb{P}(C = c) > 0 ,}$ we have
\[{ \mathbb{P}[M = m \, \vert \, C = c] = \mathbb{P}[M = m] .}\]Eg: The one time pad is an encryption scheme with perfect secrecy (although less practical due to requiring long keys). In the scheme:
- ${ \mathcal{M} = \lbrace 0, 1 \rbrace ^n }$
- ${ \texttt{Gen} }$: choose a uniform key ${ k \in \lbrace 0, 1 \rbrace ^n }$
- ${ \texttt{Enc} }$: ${ \texttt{Enc} _k (m) = k \oplus m }$ (bit-wise XOR)
- ${ \texttt{Dec} }$: ${ \texttt{Dec} _k (c) = k \oplus c .}$
Thm: One time pad is perfectly secret.
Sketch: Fix an arbitrary distribution over ${ \mathcal{M} = \lbrace 0, 1 \rbrace ^n ,}$ and arbitrary ${ m, c \in \lbrace 0, 1 \rbrace ^n .}$
Now
with terms
\[{ \begin{align*} \mathbb{P}[C = c \, \vert \, M = m ] &= \mathbb{P}[K = c \oplus m \, \vert \, M = m ] \\ &= \mathbb{P}[K = c \oplus m] \\ &= 2 ^{-n} \end{align*} }\]and
\[{ \begin{align*} \mathbb{P}[C = c] &= \sum _{m ^{'}} \mathbb{P}[C = c \, \vert \, M = m ^{'}] \, \mathbb{P}[M = m ^{'}] \\ &= \sum _{m ^{'}} 2 ^{-n} \, \mathbb{P}[M = m ^{'}] \\ &= 2 ^{-n} .\end{align*} }\]Hence
\[{ \mathbb{P}[M = m \, \vert \, C = c] = \mathbb{P}[M = m] }\]as needed.
Random number generation:
When describing algorithms, we often assume access to uniform random bits. Where do these actually come from?
One way computers do it is:
- Continually collect a pool of high-entropy (roughly, “unpredictable”) data.
- When random bits are needed, process this data to generate independent, uniform sequence of bits.
For the pool of high-entropy data, it can come from:
- External inputs (Delays between network events, Hard-disk access times, etc)
- Hardware random-number generation (eg some Intel chips)
Limitations of one time pad:
- The key (to be shared apriori) is as long as the message.
- Only secure if each key is used to encrypt a single message (if two messages ${ m _1, m _2 }$ are encrypted using same key ${ k ,}$ then the adversary can XOR the ciphertexts to get ${ (m _1 \oplus k) \oplus (m _2 \oplus k) }$ ${ = m _1 \oplus m _2 ,}$ which leaks info about ${ m _1 }$ and ${ m _2 }$).
Thm: If an encryption scheme ${ (\texttt{Gen, Enc, Dec}) }$ with message space ${ \mathcal{M} }$ is perfectly secret, then key space is atleast as large as the message space, that is ${ \vert \mathcal{K} \vert \geq \vert \mathcal{M} \vert .}$
So if ${ \mathcal{M} = \lbrace 0, 1 \rbrace ^m }$ and ${ \mathcal{K} = \lbrace 0, 1 \rbrace ^n, }$ we must have ${ n \geq m }$ i.e. keys are atleast as long as the messages.
Sketch: Suppose ${ \vert \mathcal{K} \vert < \vert \mathcal{M} \vert .}$ Pick the uniform distribution over ${ \mathcal{M} , }$ and let ${ c \in \mathcal{C} }$ be a ciphertext which occurs with nonzero probability. Let ${ \mathcal{M}(c) }$ be the set of all decryptions of ${ c ,}$ that is
\[{ \mathcal{M}(c) := \lbrace \texttt{Dec} _k (c) \, \vert \, k \in \mathcal{K} \rbrace . }\]Since ${ \texttt{Dec} }$ is deterministic, ${ \vert \mathcal{M}(c) \vert \leq \vert \mathcal{K} \vert .}$ Since ${ \vert \mathcal{K} \vert < \vert \mathcal{M} \vert ,}$ there is a message ${ m ^{‘} \in \mathcal{M} }$ with ${ m ^{‘} \notin \mathcal{M}(c) .}$ But now
\[{ \mathbb{P}[M = m ^{'} \, \vert \, C = c ] = 0 \neq \mathbb{P}[M = m ^{'}] }\]so the scheme is not perfectly secret, a contradiction.
Def: Let ${ \Pi = (\texttt{Gen, Enc, Dec}) }$ be an encryption scheme with message space ${ \mathcal{M} .}$ Let ${ \mathcal{A} }$ be an adversary (eavesdropper). We define an experiment ${ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} }$ as follows:
Adversarial indistinguishability experiment ${ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} }$:
- Adversary ${ \mathcal{A} }$ outputs a pair of messages ${ m _0, m _1 \in \mathcal{M} .}$
- A key ${ k }$ is generated using ${ \texttt{Gen} ,}$ and a uniform bit ${ b \in \lbrace 0, 1 \rbrace }$ is chosen. Ciphertext ${ c \leftarrow \texttt{Enc} _k (m _b) }$ is computed and given to ${ \mathcal{A} .}$
We call ${ c }$ the challenge ciphertext.
- ${ \mathcal{A} }$ outputs a bit ${ b ^{‘} . }$
- Output of the experiment is ${ 1 }$ (and we write ${ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} = 1 }$) if ${ b ^{‘} = b ,}$ and output is ${ 0 }$ otherwise.
Def: Encryption scheme ${ \Pi = (\texttt{Gen, Enc, Dec}) }$ with message space ${ \mathcal{M} }$ is perfectly indistinguishable if for every adversary ${ \mathcal{A} }$ we have
\[{ \mathbb{P}\left[ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} = 1 \right] = \frac{1}{2}. }\]Thm: An encryption scheme is perfectly secret if and only if it is perfectly indistinguishable.
Sketch: (To Do)
Week-2:
We can relax the definition of perfect indistinguishability to get the definition of computational secrecy.
There are two approaches for the relaxation: Concrete and Asymptotic.
Computational indistinguishability (concrete approach):
Def: Encryption scheme ${ \Pi }$ is ${ (t, \varepsilon)-}$indistinguishable if for all attackers ${ \mathcal{A} }$ running in time atmost ${ t }$ (time is measured in CPU cycles ?) it holds that \({ \mathbb{P}\left[ \texttt{PrivK} _{\mathcal{A}, \Pi} = 1 \right] \leq \frac{1}{2} + \varepsilon .}\)
Concrete security approach is sensitive to the exact computational model, and doesn’t lead to a nice theory.
We now focus on the asymptotic approach. Here we introduce a security parameter ${ n \in \mathbb{Z} _{>0} }$ (usually the key length) which is:
- Fixed by honest parties at initialization. It allows users to tailor the security level.
- Known by the adversary.
We now view running times of all parties, and success probability of the adversary, as functions of ${ n .}$
Computational indistinguishability (asymptotic approach): The informal idea is
- Security may fail with probability negligible in ${ n }.$
- Restrict attention to attacks running in time polynomial (in the sense polynomially bounded) in ${ n }.$
Recall a function ${ f : \mathbb{Z} _{> 0} \to \mathbb{R} _{\geq 0} }$ is polynomially bounded if there is a constant ${ c }$ such that ${ f(n) < n ^c }$ for all ${ n .}$ An algorithm ${ \mathcal{A} }$ runs in polynomial time if there is a polynomial ${ p }$ such that, for every input ${ x \in \lbrace 0, 1 \rbrace ^{*}, }$ the computation of ${ \mathcal{A}(x) }$ terminates within atmost ${ p(\vert x \vert) }$ steps. (What exactly is a step? What exactly is a Probabilistic Polynomial Time Algorithm? Should look at Theory of Computation)
A function ${ f : \mathbb{Z} _{> 0} \to \mathbb{R} _{\geq 0} }$ is negligible if for every polynomial ${ p }$ (with positive leading coefficient) there is an ${ N }$ such that ${ f(n) < \frac{1}{p(n)} }$ whenever ${ n > N .}$ A typical example is ${ f(n) = P(n) 2 ^{-cn} }$ for polynomial ${ P }.$
We can redefine encryption schemes and secure encryption, introducing a security parameter and runtime constraints.
(Re)Def: A private-key encryption scheme is defined by ${ 3 }$ PPT (Probabilistic Polynomial Time) algorithms ${ (\texttt{Gen, Enc, Dec}) }$ such that:
- ${ \texttt{Gen} }$ (key generation algorithm): Takes as input ${ 1 ^n }$ and outputs a key ${ k }$ with ${ \vert k \vert \geq n }.$ We write ${ k \leftarrow \texttt{Gen}(1 ^n) .}$
- ${ \texttt{Enc} }$ (encryption algorithm): Takes as input a key ${ k }$ and message ${ m \in \lbrace 0, 1 \rbrace ^{*} .}$ Outputs a ciphertext ${ c .}$ We write ${ c \leftarrow \texttt{Enc} _{k} (m) .}$
- ${ \texttt{Dec} }$ (decryption algorithm): Takes as input a key ${ k }$ and ciphertext ${ c.}$ Outputs a message ${ m \in \lbrace 0, 1 \rbrace ^{*} }$ or error ${ \perp .}$
- Correctness condition: For every ${ n,}$ for every key ${ k }$ output by ${ \texttt{Gen}(1 ^n) ,}$ and every ${ m \in \lbrace 0, 1 \rbrace ^{*} ,}$ we must have ${ \texttt{Dec} _k (\texttt{Enc} _k (m)) = m .}$
If ${ (\texttt{Gen, Enc, Dec}) }$ above is such that for ${ k }$ output by ${ \texttt{Gen}(1 ^n) }$ the algorithm ${ \texttt{Enc} _k }$ is defined only for messages ${ m \in \lbrace 0, 1 \rbrace ^{\ell (n)} ,}$ we say ${ (\texttt{Gen, Enc, Dec}) }$ is a fixed-length private-key encryption scheme for messages of length ${ \ell(n) .}$
Def: As before, we can define the adversarial indistinguishability experiment ${ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} (n) }$:
- The adversary ${ \mathcal{A} }$ is given input ${ 1 ^n },$ and outputs a pair of messages ${ m _0, m _1 }$ with ${ \vert m _0 \vert = \vert m _1 \vert .}$
- A uniform bit ${ b \in \lbrace 0, 1 \rbrace }$ is chosen. A key is generated using ${ \texttt{Gen}(1 ^n) }$ and ciphertext ${ c \leftarrow \texttt{Enc} _k (m _b) }$ is computed and given to ${ \mathcal{A} .}$
We call ${ c }$ the challenge ciphertext.
- ${ \mathcal{A} }$ outputs a bit ${ b ^{‘} \in \lbrace 0, 1 \rbrace .}$
- The output of the experiment is ${ 1 }$ (written ${ \texttt{PrivK} ^{\text{eav}} _{\mathcal{A}, \Pi} (n) = 1 }$) if ${ b = b ^{‘} }$ and ${ 0 }$ otherwise.
Def: An encryption scheme ${ \Pi }$ is computationally indistinguishable if for all PPT attackers ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon (n) }$ such that
\[{ \mathbb{P}\left[ \texttt{PrivK} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \frac{1}{2} + \varepsilon(n). }\]Pseudorandomness is an important building block for building computationally secure schemes. Like with computational security, there are two approaches to defining it, concrete and asymptotic.
Pseudorandomness (concrete approach):
Def: Let ${ D }$ be a probability distribution on ${ n -}$bit strings ${ \lbrace 0, 1 \rbrace ^n .}$ (Recall ${ x \leftarrow D }$ means ${ x }$ is sampled according to ${ D }$). We say ${ D }$ is ${ (t, \varepsilon) -}$pseudorandom if for all statistical tests ${ \mathcal{A} }$ running in time ${ \leq t }$ (Todo: check how time is talked about, without being hardware/implementation specific, in theory of computation) we have
where ${ U _n = \text{Unif}\lbrace 0, 1 \rbrace ^n.}$
Pseudorandomness (asymptotic approach):
Def: Consider a security parameter ${ n ,}$ and a polynomial ${ p(n) }$ (giving lengths of strings, as below). Consider a sequence of probability distributions ${ (D _n) }$ where each ${ D _n }$ is a distribution on ${ \lbrace 0, 1 \rbrace ^{p(n)} .}$ This sequence of distributions is pseudorandom if for every PPT attacker ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon }$ such that
where ${ U _{p(n)} = \text{Unif}\lbrace 0, 1 \rbrace ^{p(n)} .}$
A pseudorandom generator PRG is an efficient deterministic algorithm that expands a short uniform seed to a longer pseudorandom output.
Def: Let ${ G }$ be a deterministic poly-time algorithm, such that for any ${ n }$ and any input ${ s \in \lbrace 0, 1 \rbrace ^{n} }$ the result ${ G(s) \in \lbrace 0, 1 \rbrace ^{p (n)} }$ (and ${ p }$ is a polynomial). We call ${ G }$ a pseudorandom generator if:
- ${ p(n) > n }$ for every ${ n .}$
- For any PPT algorithm ${ \mathcal{A} ,}$ there is a negligible function ${ \varepsilon (n) }$ such that \({ \vert \mathbb{P} _{s \leftarrow U _n} [\mathcal{A}(G(s)) = 1] - \mathbb{P} _{s \leftarrow U _{p(n)} } [\mathcal{A}(s) = 1] \vert \leq \varepsilon(n). }\) That is, the sequence of distributions ${ D _n }$ given by the random vectors ${ \lbrace G(s) : s \leftarrow U _n \rbrace }$ is pseudorandom.
An unconditional proof that PRGs exist is not yet known. But there are practical examples of PRGs.
Def: Let ${ G }$ be a deterministic poly-time algorithm, with ${ \vert G(s) \vert = p(\vert s \vert) > \vert s \vert. }$ Let ${ n }$ be a security parameter. A pseudo one time pad encryption scheme works as follows:
- ${ \texttt{Gen} }$: on input ${ 1 ^n ,}$ output uniform ${ n-}$bit key ${ k .}$
- ${ \texttt{Enc} }$: given a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and message ${ m \in \lbrace 0, 1 \rbrace ^{p(n)} }$ output ${ c = G(k) \oplus m .}$
- ${ \texttt{Dec} }$: given a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and ciphertext ${ c \in \lbrace 0, 1 \rbrace ^{p(n)} }$ output ${ m = G(k) \oplus c .}$
Correctness condition clearly holds. If ${ G }$ is a PRG, we also have computational security.
Thm: If ${ G }$ is a PRG, the above pseudo one time pad is computationally secure.
Sketch: (Proof by Reduction)
Let ${ G }$ be a PRG with expansion factor ${ p(n), }$ and let ${ \Pi }$ be the above pseudo OTP encryption scheme. We are to show that ${ \Pi }$ is computationally secure, that is for any PPT attacker ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon(n) }$ such that ${ \mathbb{P}\left[ \texttt{PrivK} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \frac{1}{2} + \varepsilon(n). }$
Fix an ${ n }$. Let ${ \mathcal{A} }$ be a PPT attacker. We construct a distinguisher ${ D }$ that takes a string ${ w \in \lbrace 0, 1 \rbrace ^{p(n)} }$ as input and whose goal is to determine whether ${ w }$ was chosen uniformly or was a pseudorandom output from ${ G .}$ We can define ${ D }$ to be:
Distinguisher ${ D }$: ${ D }$ is given an input string ${ w \in \lbrace 0, 1 \rbrace ^{p(n)} }.$
- Run ${ \mathcal{A}(1 ^n) }$ to obtain pair of messages ${ m _0, m _1 \in \lbrace 0, 1 \rbrace ^{p(n)} .}$
- Choose a uniform bit ${ b \in \lbrace 0, 1 \rbrace .}$ Set ${ c = w \oplus m _b .}$
- Give ${ c }$ to ${ \mathcal{A} }$ and obtain output ${ b ^{‘} .}$ Output ${ 1 }$ if ${ b = b ^{‘} }$ and ${ 0 }$ otherwise.
We see ${ D }$ runs in polynomial time. In contrast to above pseudo OTP ${ \Pi ,}$ consider the classical OTP ${ \tilde{\Pi} }$ which generates and uses a uniform key ${ k \in \lbrace 0, 1 \rbrace ^{p(n)} .}$ Now:
- Perfect secrecy of OTP ${ \tilde{\Pi} }$ gives ${ \mathbb{P}\left[\texttt{PrivK} _{\mathcal{A}, \tilde{\Pi}} (n) = 1 \right] = \frac{1}{2}. }$
- If input to D is ${ w \leftarrow U _{p(n)} }$: The view of ${ \mathcal{A} }$ when run as a subroutine by ${ D }$ is identical to the view of ${ \mathcal{A} }$ in classical OTP experiment ${ \texttt{PrivK} _{\mathcal{A}, \tilde{\Pi}} (n) .}$ So
- If input to D is ${ w = G(k) }$ where ${ k \leftarrow U _n }$: The view of ${ \mathcal{A} }$ when run as a subroutine by ${ D }$ is identical to the view of ${ \mathcal{A} }$ in pseudo OTP experiment ${ \texttt{PrivK} _{\mathcal{A}, \Pi} (n) .}$ So
- By pseudorandomness of the (sequence of) distributions of ${ \lbrace G(k) : k \leftarrow U _n \rbrace ,}$ for some negligible function ${ \varepsilon(n) }$ we have
Substituting in the pseudorandomness condition the above values, we get
\[{ \left\vert \mathbb{P}\left[ \texttt{PrivK} _{\mathcal{A}, \Pi} (n) = 1 \right] - \frac{1}{2} \right\vert \leq \varepsilon(n) .}\]So especially ${ \mathbb{P}\left[ \texttt{PrivK} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \frac{1}{2} + \varepsilon(n) ,}$ as needed.
Week-3:
Compared to classical OTP, the pseudo OTP scheme allows for using shorter keys. But keys shouldn’t be reused in both the schemes.
We now consider CPA-security, that is security against chosen plaintext attacks (where attacker can both observe ciphertexts and obtain plaintext-ciphertext pairs for plaintexts of its choice).
In the formal definition we model chosen plaintext attacks by giving adversary ${ \mathcal{A} }$ access to an encryption oracle ${ \texttt{Enc} _k (\cdot) }$, viewed as a black box that encrypts messages of ${ \mathcal{A} }$’s choice using a key ${ k }$ unknown to ${ \mathcal{A} .}$
Def: Consider an encryption scheme ${ \Pi ,}$ adversary ${ \mathcal{A} }$ and security parameter ${ n .}$ The CPA-indistinguishability experiment ${ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi} (n) }$ is given by:
- A key ${ k }$ is generated using ${ \texttt{Gen}(1 ^n) .}$
- The adversary ${ \mathcal{A} }$ is given input ${ 1 ^n }$ and oracle access to ${ \texttt{Enc} _k (\cdot), }$ and outputs a pair of messages ${ m _0, m _1 }$ of the same length.
- A uniform bit ${ b \in \lbrace 0, 1 \rbrace }$ is chosen, and then a ciphertext ${ c \leftarrow \texttt{Enc} _k (m _b) }$ is computed and given to ${ \mathcal{A} .}$
- Adversary ${ \mathcal{A} }$ continues to have oracle access to ${ \texttt{Enc} _k (\cdot), }$ and outputs a bit ${ b ^{‘} .}$
- Output of the experiment is ${ 1 }$ if ${ b ^{‘} = b }$ and ${ 0 }$ otherwise.
Note the attacker can always succeed if ${ \texttt{Enc} _k (\cdot) }$ is deterministic.
Def: A private-key encryption scheme ${ \Pi }$ is CPA-secure if for any PPT adversary ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon (n) }$ such that
\[{ \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \frac{1}{2} + \varepsilon(n). }\]We now consider pseudorandom functions.
Def: A keyed function ${ F : \lbrace 0, 1 \rbrace ^{\ast} \times \lbrace 0, 1 \rbrace ^{\ast} \to \lbrace 0, 1 \rbrace ^{\ast} }$ is a two-input function, where the first input is called the key. (We say ${ F }$ is efficiently computable if there is a polynomial-time algorithm that computes ${ F(k, x) }$ given ${ k }$ and ${ x }$). Assume key length, input length and output length are all determined by security parameter ${ n }$: We associate with ${ F }$ three functions ${ \ell _{\text{key}}, \ell _{\text{in}}, \ell _{\text{out}}, }$ such that for any key ${ k \in \lbrace 0, 1 \rbrace ^{\ell _{\text{key}} (n)} }$ the function ${ F _k (\cdot) = F(k, \cdot) }$ is only defined for inputs ${ x \in \lbrace 0, 1 \rbrace ^{\ell _{\text{in}} (n) } }$ with outputs ${ F _k (x) \in \lbrace 0, 1 \rbrace ^{\ell _{\text{out}} (n)} }.$ Unless stated otherwise we assume ${ F }$ is length preserving meaning ${ \ell _{\text{key}} (n) = \ell _{\text{in}} (n) = \ell _{\text{out}} (n) = n. }$
A keyed function ${ F }$ induces a distribution on ${ \text{Func} _n = \lbrace \text{all maps } \lbrace 0, 1 \rbrace ^n \to \lbrace 0, 1 \rbrace ^n \rbrace, }$ where the distribution is obtained by choosing a uniform key ${ k \leftarrow U \lbrace 0, 1 \rbrace ^n }$ and considering ${ F _k (\cdot) .}$
Consider a polynomial time distinguisher ${ D ,}$ which is given access to a (black box) oracle ${ \mathcal{O} }$ which is either equal to ${ F _k }$ (where ${ k \leftarrow U \lbrace 0, 1 \rbrace ^n }$) or ${ f }$ (where ${ f \leftarrow U(\text{Func} _n) }$).
Since ${ D }$ runs in polynomial time, it can ask only polynomials many queries to the oracle.
Def: A (efficient, length preserving) keyed function ${ F : \lbrace 0, 1 \rbrace ^{\ast} \times \lbrace 0, 1 \rbrace ^{\ast} \to \lbrace 0, 1 \rbrace ^{\ast} }$ is a pseudorandom function (PRF) if for any PPT distinguisher ${ D }$ there is a negligible function ${ \varepsilon(n) }$ such that
\[{ \left\vert \mathbb{P} _{k \leftarrow U\lbrace 0, 1 \rbrace ^n} \left[ D ^{F _k (\cdot)} (1 ^n) = 1 \right] - \mathbb{P} _{f \leftarrow U(\text{Func} _n)} \left[D ^{f(\cdot)} (1 ^n) = 1 \right] \right\vert \leq \varepsilon (n). }\]Eg: Consider the keyed function
\[{ F(k, x) = k \oplus x .}\]For any input ${ x ,}$ the value ${ F _k (x) }$ is uniformly distributed (when ${ k }$ is uniform). But we will see ${ F }$ is not pseudorandom since its values on any two points correlated.
Consider a distinguisher ${ D }$ that queries its oracle ${ \mathcal{O} }$ on distinct points ${ x _1, x _2 }$ to obtain values ${ y _1 = \mathcal{O}(x _1), y _2 = \mathcal{O}(x _2) },$ and outputs ${ 1 }$ if and only if ${ y _1 \oplus y _2 = x _1 \oplus x _2 .}$
If ${ \mathcal{O} = F _k }$ for any ${ k ,}$ then ${ D }$ outputs ${ 1 .}$ If ${ \mathcal{O} = f }$ where ${ f \leftarrow U(\text{Func} _n), }$ then
since ${ f(x _2) }$ is uniform and independent of ${ x _1, x _2, f(x _1) .}$ So for this distinguisher the expression
\[{ \left\vert \mathbb{P} \left[ D ^{F _k (\cdot)} (1 ^n) = 1 \right] - \mathbb{P} \left[D ^{f(\cdot)} (1 ^n) = 1 \right] \right\vert = 1 - 2 ^{-n} }\]is not bounded by a negligible function. Hence ${ F }$ is not pseudorandom.
Thm: If ${ F }$ is a PRF, then ${ G(s) = F _s (0) \lVert F _s (1) \lVert F _s (2) \lVert \ldots \lVert F _s (\ell) }$ for any desired ${ \ell }$ is a PRG.
Sketch: (To do)
Consider ${ \text{Perm} _n \subseteq \text{Func} _n ,}$ the set of all permutations ${ \lbrace 0, 1 \rbrace ^n \to \lbrace 0, 1 \rbrace ^n .}$
Def: Let ${ F }$ be a keyed function, where for the moment ${ \ell _{\text{key}} , \ell _{\text{in}}, \ell _{\text{out}} }$ are arbitrary. We call ${ F }$ a keyed permutation if ${ \ell _{\text{in}} = \ell _{\text{out}}, }$ and for all ${ k \in \lbrace 0, 1 \rbrace ^{\ell _{\text{key}} (n)} }$ the function ${ F _k : \lbrace 0, 1 \rbrace ^{\ell _{\text{in}} (n)} \to \lbrace 0, 1 \rbrace ^{\ell _{\text{in}} (n)} }$ is a permutation. In this case, we call ${ \ell _{\text{in}} }$ the block length of ${ F .}$ (A keyed permutation is efficient if there is a polynomial-time algorithm for computing ${ F _k (x) }$ given ${ k }$ and ${ x ,}$ as well as a polynomial-time algorithm for computing ${ F _k ^{-1} (y) }$ given ${ k }$ and ${ y }$). Unless stated otherwise we assume ${ F }$ is length preserving, that is ${ \ell _{\text{key}} (n) = \ell _{\text{in}} (n) = n .}$
Definition of a pseudorandom permutation (PRP) is similar to that of a pseudorandom function: Informally no efficient algorithm can distinguish between black-box access to ${ F _k }$ (for uniform key ${ k }$) and black-box access to ${ f }$ (for uniform ${ f \in \text{Perm} _n }$).
We also have strong pseudorandom permutations, in whose definition we consider distinguishers with oracle access to both a permutation and its inverse.
Def: Let ${ F : \lbrace 0, 1 \rbrace ^{\ast} \times \lbrace 0, 1 \rbrace ^{\ast} \to \lbrace 0, 1 \rbrace ^{\ast} }$ be an (efficient, length preserving) keyed permutation. We call ${ F }$ a strong pseudorandom permutation (strong PRP) if for all PPT distinguishers ${ D }$ there is a negligible function ${ \varepsilon(n) }$ such that
\[{ \left\vert \mathbb{P} _{k \leftarrow U\lbrace 0, 1 \rbrace ^n} \left[D ^{F _k, F _k ^{-1}} (1 ^n) = 1 \right] - \mathbb{P} _{f \leftarrow U(\text{Perm} _n)} \left[D ^{f, f ^{-1}} (1 ^n) = 1\right] \right\vert \leq \varepsilon(n). }\]Thm: If ${ F }$ is a PRP with ${ \ell _{\text{in}} (n) \geq n }$ then ${ F }$ is a PRF.
Sketch: (To do)
PRFs can be used to construct CPA-secure encryption schemes.
Let ${ F }$ be a pseudorandom function. Define a (fixed length, private key) encryption scheme for messages of length ${ n }$ as follows:
- ${ \texttt{Gen} }$: on input ${ 1 ^n ,}$ choose a uniform ${ k \in \lbrace 0, 1 \rbrace ^n }$ and output it
- ${ \texttt{Enc} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and message ${ m \in \lbrace 0, 1 \rbrace ^n ,}$ choose a uniform ${ r \in \lbrace 0, 1 \rbrace ^n }$ and output ciphertext ${ c = \langle r, F _k (r) \oplus m \rangle }$
- ${ \texttt{Dec} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and ciphertext ${ c = \langle r, s \rangle ,}$ output message ${ m = F _k (r) \oplus s .}$
Thm: The above encryption scheme ${ \Pi }$ is CPA-secure.
Sketch: Let ${ \widetilde{\Pi} = (\widetilde{\texttt{Gen}}, \widetilde{\texttt{Enc}}, \widetilde{\texttt{Dec}}) }$ be an encryption scheme same as ${ \Pi }$ except that a truly random function ${ f }$ is used instead of ${ F _k .}$ (That is, ${ \widetilde{\texttt{Gen}}(1 ^n) }$ chooses an ${ f \leftarrow U(\text{Func} _n) ,}$ and ${ \widetilde{\texttt{Enc}} }$ encrypts like ${ \texttt{Enc} }$ but using ${ f }$ instead of ${ F _k }$).
Fix a PPT adversary ${ \mathcal{A} ,}$ and let ${ q(n) }$ be an upper bound on the number of queries that ${ \mathcal{A}(1 ^n) }$ makes to its encryption oracle.
Step 1: Show that
\[{ \left\vert \mathbb{P}\left[\texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi}(n) = 1 \right] - \mathbb{P}\left[\texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \right] \right\vert \leq \text{negl}(n). }\]We show the above by reduction: we use ${ \mathcal{A} }$ to construct a distinguisher ${ D }$ for the PRF ${ F .}$ The distinguisher ${ D }$ is given oracle access to ${ \mathcal{O} ,}$ and its goal is to determine whether ${ \mathcal{O} }$ is “pseudorandom” (ie equal to ${ F _k }$ for uniform ${ k \in \lbrace 0, 1 \rbrace ^n }$) or “random” (ie equal to ${ f }$ for uniform ${ f \in \text{Func} _n }$).
Distinguisher ${ D }$: It is given input ${ 1 ^n }$ and access to an oracle ${ \mathcal{O} : \lbrace 0, 1 \rbrace ^n \to \lbrace 0, 1 \rbrace ^n .}$
- Run ${ \mathcal{A}(1 ^n). }$ Whenever ${ \mathcal{A} }$ queries encryption oracle on a message ${ m \in \lbrace 0, 1 \rbrace ^n ,}$ answer this query in the following way:
- Choose uniform ${ r \in \lbrace 0, 1 \rbrace ^n }$
- Query ${ \mathcal{O}(r) }$
- Return the ciphertext ${ \langle r, \mathcal{O}(r) \oplus m \rangle }$ to ${ \mathcal{A} .}$
- When ${ \mathcal{A} }$ outputs messages ${ m _0, m _1 \in \lbrace 0, 1 \rbrace ^n ,}$ choose a uniform bit ${ b \in \lbrace 0, 1 \rbrace }$ and then:
- Choose uniform ${ r \in \lbrace 0, 1 \rbrace ^n }$
- Query ${ \mathcal{O}(r) }$
- Return the challenge ciphertext ${ \langle r, \mathcal{O}(r) \oplus m _b \rangle .}$
- Continue answering encryption-oracle queries of ${ \mathcal{A} }$ as above until ${ \mathcal{A} }$ outputs a bit ${ b ^{‘} .}$ Output ${ 1 }$ if ${ b = b ^{‘} }$ and ${ 0 }$ otherwise.
Note ${ D }$ runs in polynomial time.
If ${ \mathcal{O} = F _k }$ for uniform ${ k \in \lbrace 0, 1 \rbrace ^n }$, the view of ${ \mathcal{A} }$ when run as a subroutine of ${ D }$ is same as the view of ${ \mathcal{A} }$ in experiment ${ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi } (n). }$ So
\[{ \mathbb{P} _{k \leftarrow U \lbrace 0, 1 \rbrace ^n} \left[ D ^{F _k} (1 ^n) = 1 \right] = \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi} (n) = 1 \right] .}\]If ${ \mathcal{O} = f }$ for uniform ${ f \in \text{Func} _n }$, the view of ${ \mathcal{A} }$ when run as a subroutine of ${ D }$ is same as the view of ${ \mathcal{A} }$ in experiment ${ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n). }$ So
\[{ \mathbb{P} _{f \leftarrow U(\text{Func} _n)} \left[ D ^{f} (1 ^n) = 1 \right] = \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \right]. }\]By pseudorandomness of ${ F ,}$
\[{ \left\vert \mathbb{P}\left[ D ^{F _k} (1 ^n) = 1 \right] - \mathbb{P}\left[ D ^{f} (1 ^n) = 1 \right] \right\vert \leq \text{negl}(n).}\]Substituting above values in the expression, we have
\[{ \left\vert \mathbb{P}\left[\texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi}(n) = 1 \right] - \mathbb{P}\left[\texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \right] \right\vert \leq \text{negl}(n). }\]Step 2: Show that
\[{ \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \right] \leq \frac{1}{2} + \frac{q(n)}{2 ^n}. }\]Recall ${ q(n) }$ is an upper bound on the number of queries ${ \mathcal{A}(1 ^n) }$ makes to its encryption oracle.
Notice that every time a message ${ m }$ is encrypted in the experiment ${ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) }$ (either by the encryption oracle or when the challenge ciphertext is computed), a uniform ${ r \in \lbrace 0, 1 \rbrace ^n }$ is chosen and the ciphertext is set equal to ${ \langle r, f(r) \oplus m \rangle .}$ Let the challenge ciphertext be ${ \langle r ^{\ast}, f(r ^{\ast}) \oplus m _b \rangle .}$
There are two cases:
- The value ${ r ^{\ast} }$ is never used when answering any of ${ \mathcal{A} }$’s encryption-oracle queries: In this case ${ \mathcal{A} }$ learns nothing about ${ f(r ^{\ast}) }$ from its interaction with the encryption oracle. So the probability that ${ \mathcal{A} }$ outputs ${ b ^{‘} = b }$ in this case is exactly ${ \frac{1}{2} }$ (like in the one time pad).
- The value ${ r ^{\ast} }$ is used atleast once when answering ${ \mathcal{A} }$’s encryption-oracle queries: In this situation ${ \mathcal{A} }$ can determine whether ${ m _0 }$ or ${ m _1 }$ was encrypted. Since in some instance the encryption oracle returned ${ \langle r ^{\ast}, s \rangle }$ in response to a request to encrypt a message ${ m }$ (where ${ r ^{\ast} }$ is from the challenge ciphertext), the adversary learns that ${ f(r ^{\ast}) = s \oplus m .}$ Note this event occurs with probability ${ \leq q(n)/{2 ^n} .}$ (Note ${ \mathcal{A} }$ makes atmost ${ q(n) }$ queries to its encryption oracle, so atmost ${ q(n) }$ distinct values of ${ r }$ are used when answering ${ \mathcal{A} }$’s encryption-oracle queries, and the ${ r ^{\ast} }$ of challenge ciphertext is chosen uniformly from ${ \lbrace 0, 1 \rbrace ^n }$).
Let ${ \texttt{repeat} }$ be the event that ${ r ^{\ast} }$ is used atleast once when answering ${ \mathcal{A} }$’s encryption-oracle queries. Now
\[{ \begin{align*} &\mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \right] \\ = \, &\mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \, \wedge \, \texttt{repeat} \right] + \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \, \wedge \, \overline{\texttt{repeat}} \right] \\ \leq \, &\mathbb{P}\left[ \texttt{repeat} \right] + \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \widetilde{\Pi}} (n) = 1 \, \bigg\vert \, \overline{\texttt{repeat}} \right] \\ \leq \, &\frac{q(n)}{2 ^n} + \frac{1}{2}. \end{align*} }\]So Step ${ 2 }$ is done.
Combining Steps ${ 1 }$ and ${ 2 ,}$ we see
\[{ \begin{align*} \mathbb{P}\left[ \texttt{PrivK} ^{\text{cpa}} _{\mathcal{A}, \Pi} (n) \right] &\leq \frac{1}{2} + \frac{q(n)}{2 ^n} + \text{negl}(n) \\ &= \frac{1}{2} + \text{negl} ^{'} (n) \end{align*} }\]as needed.
We now briefly consider CTR (Counter) and CBC (Cipher Block Chaining) modes for encrypting blocks of plaintexts using PRFs.
Def: In CTR mode, encryption ${ \texttt{Enc} _k (m _1, \ldots, m _t) }$ is done as:
- Choose ${ \text{ctr} \leftarrow U\lbrace 0, 1 \rbrace ^n ,}$ set \({ c _0 = \text{ctr} .}\)
- For ${ i = 1 }$ to ${ t ,}$ set \({ c _i = m _i \oplus F _k (\text{ctr} + i). }\)
- Output ${ c _0, c _1, \ldots, c _t .}$
Thm: If ${ F }$ is a pseudorandom function, the above CTR mode is CPA secure.
Sketch: (To do)
Def: In CBC mode, encryption ${ \texttt{Enc} _k (m _1, \ldots, m _t) }$ is done as:
- Choose ${ c _0 \leftarrow U\lbrace 0, 1 \rbrace ^n }$ (also called IV, the initialisation vector)
- For ${ i = 1 }$ to ${ t }$ set \({ c _i = F _k (m _i \oplus c _{i-1}) .}\)
- Output ${ c _0, c _1, \ldots, c _t .}$
Note that here decryption requires ${ F _k }$ to be invertible.
Thm: If ${ F }$ is a pseudorandom permutation, the above CBC mode is CPA secure.
Sketch: (To do)
So far we only considered private-key encryption schemes with a passive eavesdropping attacker.
An attacker can also be active (interfering with the communication channel). We now consider malleability and CCA security.
Informally, a scheme is malleable if it is possible to modify a ciphertext and thereby cause a predictable change to the plaintext. Malleability is damaging (eg in encrypted email, encrypted transactions).
Def: Consider a private-key encryption scheme ${ \Pi ,}$ adversary ${ \mathcal{A} }$ and security parameter ${ n .}$ The CCA-indistinguishability experiment ${ \texttt{PrivK} ^{\text{cca}} _{\mathcal{A}, \Pi} (n) }$ is given by:
- A key is generated by running ${ \texttt{Gen}(1 ^n) .}$
- ${ \mathcal{A} }$ is given input ${ 1 ^n }$ and oracle access to ${ \texttt{Enc} _k (\cdot) }$ and ${ \texttt{Dec} _k (\cdot) .}$ It outputs a pair of equal length messages ${ m _0, m _1 .}$
- A uniform bit ${ b \in \lbrace 0, 1 \rbrace }$ is chosen, and then a challenge ciphertext ${ c \leftarrow \texttt{Enc} _k (m _b) }$ is computed and given to ${ \mathcal{A}. }$
- Adversary ${ \mathcal{A} }$ continues to have oracle access to ${ \texttt{Enc} _k (\cdot) }$ and ${ \texttt{Dec} _k (\cdot) ,}$ but is not allowed to query ${ \texttt{Dec} _k (c) .}$ Eventually ${ \mathcal{A} }$ outputs a bit ${ b ^{‘} .}$
- The output of the experiment is ${ 1 }$ if ${ b ^{‘} = b }$ and ${ 0 }$ otherwise.
Def: A private-key encryption scheme ${ \Pi }$ is CCA-secure if for any PPT adversary ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon(n) }$ such that
\[{ \mathbb{P}\left[ \texttt{PrivK} ^{\text{cca}} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \frac{1}{2} + \varepsilon(n). }\]Intuitively if a scheme is malleable it is not CCA secure (attacker can modify few bits in ${ c }$ and submit modified ciphertext ${ c ^{‘} }$ to the decryption oracle). So CCA secure schemes must be non-malleable.
All the schemes seen so far are malleable (especially the one time pad). Modern encryption schemes are designed to be CCA secure.
Week-4:
In the private-key setting, message authentication codes are used to ensure integrity.
Def: A message authentication code (MAC) consists of ${ 3 }$ PPT algorithms ${ (\texttt{Gen}, \texttt{Mac}, \texttt{Vrfy}) }$ such that:
- The key generation algorithm ${ \texttt{Gen} }$ takes as input the security parameter ${ 1 ^n }$ and outputs a key ${ k }$ with ${ \vert k \vert \geq n .}$
- The tag generation algorithm ${ \texttt{Mac} }$ takes as input a key ${ k }$ and message ${ m \in \lbrace 0, 1 \rbrace ^{\ast} }$ and outputs a tag ${ t \leftarrow \texttt{Mac} _k (m). }$
- The deterministic verification algorithm ${ \texttt{Vrfy} }$ takes as input a key ${ k ,}$ a message ${ m }$ and a tag ${ t .}$ It outputs a bit ${ b },$ with ${ b = 1 }$ meaning valid and ${ b = 0 }$ meaning invalid. We write ${ b := \texttt{Vrfy} _k (m, t).}$
- We have a correctness criterion, that for every ${ n, }$ every key ${ k }$ output by ${ \texttt{Gen}(1 ^n), }$ and every ${ m \in \lbrace 0, 1 \rbrace ^{\ast}, }$ it holds that ${ \texttt{Vrfy} _k (m, \texttt{Mac} _k (m)) = 1 .}$
If there is a function ${ \ell }$ such that for every ${ k }$ output by ${ \texttt{Gen}(1 ^n) }$ the algorithm ${ \texttt{Mac} _k (\cdot) }$ is defined only for messages ${ m \in \lbrace 0, 1 \rbrace ^{\ell (n)} }$ we call the scheme a fixed-length MAC for messages of length ${ \ell(n). }$
Def: Consider a MAC ${ \Pi ,}$ an adversary ${ \mathcal{A} }$ and security parameter ${ n .}$ The message authentication experiment ${ \texttt{Mac-forge} _{\mathcal{A}, \Pi} (n) }$ is given by:
- A key ${ k }$ is generated by running ${ \texttt{Gen}(1 ^n) .}$
- The adversary ${ \mathcal{A} }$ is given input ${ 1 ^n }$ and oracle access to ${ \texttt{Mac} _k (\cdot) .}$ The adversary eventually outputs ${ (m, t). }$ Let ${ \mathcal{Q} }$ be the set of all queries that ${ \mathcal{A} }$ submitted to its oracle.
- Output of the experiment is ${ 1 }$ if ${ \texttt{Vrfy} _k (m, t) = 1 }$ and ${ m \notin \mathcal{Q} .}$
Def: A MAC ${ \Pi }$ is secure if for any PPT adversary ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon(n) }$ such that
\[{ \mathbb{P} \left[ \texttt{Mac-forge} _{\mathcal{A}, \Pi} (n) = 1 \right] \leq \varepsilon(n). }\]Note that here secure MACs don’t provide security against replay attacks.
Def: Let ${ F }$ be a (length preserving) PRF. Define a fixed length MAC for messages of length ${ n }$ as:
- ${ \texttt{Mac} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and message ${ m \in \lbrace 0, 1 \rbrace ^n ,}$ output the tag ${ t := F _k (m). }$
- ${ \texttt{Vrfy} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n ,}$ message ${ m \in \lbrace 0, 1 \rbrace ^n }$ and tag ${ t \in \lbrace 0, 1 \rbrace ^n ,}$ output ${ 1 }$ if and only if ${ t = F _k (m). }$
Thm: If ${ F }$ is a PRF, the above construction is a secure fixed length MAC for messages of length ${ n .}$
Pf: (To do. Standard reduction proof. Using a PPT adversary ${ \mathcal{A} }$ for the MAC, construct a distinguisher ${ D }$ for the PRF)
Def: Let ${ F }$ be a PRF with length function ${ \ell(n) > 0.}$ The basic CBC-MAC construction is:
- ${ \texttt{Mac} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n }$ and message ${ m }$ of length ${ \ell (n) \cdot n , }$ do the following:
- Write ${ m }$ as ${ m = m _1 \ldots m _{\ell} }$ where each ${ m _{i} }$ is of length ${ n .}$
- Set ${ t _0 := 0 ^n .}$ Then, for ${ i = 1 }$ to ${ \ell }$ set \({ t _i = F _k (t _{i-1} \oplus m _i) .}\)
- Output ${ t _{\ell} }$ as the tag.
- ${ \texttt{Vrfy} }$: on input a key ${ k \in \lbrace 0, 1 \rbrace ^n ,}$ a message ${ m ,}$ and a tag ${ t ,}$ do: If ${ m }$ is not of length ${ \ell(n) \cdot n }$ output ${ 0.}$ Otherwise output ${ 1 }$ if and only if ${ t = \texttt{Mac} _k (m). }$
Thm: Let ${ \ell }$ be a polynomial, and ${ F }$ a PRF with length function ${ \ell .}$ Then the above construction is a secure MAC for messages of length ${ \ell(n) \cdot n .}$
Pf: (To do)
Def: A hash function (with output length ${ \ell(n) }$) is a pair of PPT algorithms ${ (\texttt{Gen}, H) }$ satisfying:
- ${ \texttt{Gen} }$ takes as input a security parameter ${ 1 ^n }$ and outputs a key ${ s .}$ We assume ${ n }$ is implicit in ${ s .}$
- ${ H }$ is a deterministic algorithm that takes as input a key ${ s }$ and a string ${ x \in \lbrace 0, 1 \rbrace ^{\ast} }$ and outputs a string ${ H ^s (x) \in \lbrace 0, 1 \rbrace ^{\ell (n)} }$ (where ${ n }$ is the value of the security parameter implicit in ${ s }$).
If ${ H ^s }$ is defined only for inputs ${ x }$ of length ${ \ell ^{‘} (n) > \ell (n) ,}$ we say ${ (\texttt{Gen}, H) }$ is a fixed-length hash function for inputs of length ${ \ell ^{‘} (n) .}$
Def: Consider a hash function ${ \mathcal{H} = (\texttt{Gen}, H) ,}$ an adversary ${ \mathcal{A} ,}$ and a security parameter ${ n .}$ The collision-finding experiment ${ \texttt{Hash-coll} _{\mathcal{A}, \mathcal{H}} (n) }$ is given by:
- A key ${ s }$ is generated by running ${ \texttt{Gen}(1 ^n) .}$
- The adversary ${ \mathcal{A} }$ is given ${ s ,}$ and outputs ${ x, x ^{‘} .}$ (If ${ \mathcal{H} }$ is a fixed-length hash function for inputs of length ${ \ell ^{‘} (n) ,}$ then we require ${ x, x ^{‘} \in \lbrace 0, 1 \rbrace ^{\ell ^{‘} (n)} }$).
- The output of the experiment is defined to be ${ 1 }$ if and only if ${ x \neq x ^{‘} }$ and ${ H ^{s} (x) = H ^{s} (x ^{‘}) .}$ In such a case we say that ${ \mathcal{A} }$ has found a collision.
Def: A hash function ${ \mathcal{H} = (\texttt{Gen}, H) }$ is collision-resistant if for any PPT adversary ${ \mathcal{A} }$ there is a negligible function ${ \varepsilon(n) }$ such that
\[{ \mathbb{P}\left[ \texttt{Hash-coll} _{\mathcal{A}, \mathcal{H}} (n) = 1 \right] \leq \varepsilon(n). }\]Def: Let ${ \Pi = (\texttt{Mac, Vrfy}) }$ be a MAC for messages of length ${ \ell(n) ,}$ and let ${ \mathcal{H} = (\texttt{Gen} _{H}, H) }$ be a hash function with output length ${ \ell (n) .}$ Construct a MAC ${ \Pi ^{‘} = (\texttt{Gen} ^{‘}, \texttt{Mac} ^{‘}, \texttt{Vrfy} ^{‘}) }$ for arbitrary length messages as follows:
- ${ \texttt{Gen} ^{‘} }$: on input ${ 1 ^n ,}$ choose uniform ${ k \in \lbrace 0, 1 \rbrace ^n }$ and run ${ \texttt{Gen} _H (1 ^n) }$ to obtain ${ s ;}$ output the key ${ (k, s) }$
- ${ \texttt{Mac} ^{‘} }$: on input a key ${ (k, s) }$ and a message ${ m \in \lbrace 0, 1 \rbrace ^{\ast} ,}$ output ${ t \leftarrow \texttt{Mac} _k (H ^s (m)) }$
- ${ \texttt{Vrfy} ^{‘} }$: on input a key ${ (k, s), }$ a message ${ m \in \lbrace 0, 1 \rbrace ^{\ast} ,}$ and a tag ${ t ,}$ output ${ 1 }$ if and only if ${ \texttt{Vrfy} _k (H ^s (m), t) = 1 .}$
Thm: If ${ \Pi }$ is a secure MAC for messages of length ${ \ell(n) }$ and ${ \mathcal{H} }$ is collision resistant, the above hash-and-MAC construction is a secure MAC for arbitrary length messages.
Pf: (To do)
Thm: If we have a CPA secure encryption scheme and a secure MAC, then the encrypt-then-authenticate protocol is CCA secure.
Pf: (To do)
Note authenticate-then-encrypt is not secure as the tags can leak information about the messages. For eg if ${ \texttt{Mac} _k (\cdot) }$ is deterministic then an attacker knows when two ciphertexts correspond to the same message.
Counters and identities are used to prevent replay attacks, reordering attacks, reflection attacks, etc.
(Todo: Proofs for above results; Public key encryption)