Recovering AES key from side-channel analysis
Today I have a write-up for one of the crypto challenges from the Google 2022 CTF (https://capturetheflag.withgoogle.com/) called Electric Mayhem CLS.
The focus of this challenge is side-channel attack on AES,
“A side-channel attack is an attack on the information of a cryptographic device. The concept of ‘information’ in the context of a side-channel attack generally refers to a secret key of a cryptographic algorithm. A side-channel attack is carried out by monitoring the physical outputs of a device (e.g. power consumption, time taken to carry out an operation, emission of heat, light and sound). The hypothesis made in such an attack is that the physical outputs of a cryptographic device demonstrate correlation with the internal state of the device
when conducting cryptographic operations.”
For the challenge in question we are provided with 50 physical output measurements with 1806 samples each taken during the encryption process of 50 different plaintexts.
The measurements are visualized using an oscilloscope (if I remember from my university classes correctly, looking at the peaks and minimums of the oscilloscope output, it looks like 10 rounds of AES = therefore 128 bit input and 128 bit secret key; I am hoping that Google wouldn’t just show 10 rounds out of 12/14 just for the kicks :])
A brief summary of AES, the AES is a symmetric block cipher (128-bit blocks), it has a layered structure (S-box, confusion, diffusion, key addition), it performs mathematical operations with elements of Galois fields (extension fields) and depending on the length of the secret key from which it derives individual round keys it executes either 10 (128-bit key), 12 (192-bit key) or 14 (256-bit key) rounds of AES per block.
For this challenge I am using the CPA (Correlation Power Analysis) analysis and the article from the Journal of Cyber Security Technology as a reference (see above).
Because we are dealing with a 128-bit secret key = 16 bytes and for each byte of the secret key we have 256 possible guesses (0x00–0xFF), we effectively have to “bruteforce” only 4096 guesses,
as opposed to bruteforcing the entire possible 128-bit keyspace.
We therefore go over all the bytes of the key, indexed from 0 to 15 and for each byte we guess all the possible values, indexed 0 to 255.
What is a correlation? Correlation is a mutual relationship or connection between two or more sets.
To calculate the correlation coefficient we need two input sets x and y.
To calculate x, for each guess of the key byte we perform the initial AES steps (key addition/key whitening and S-box substitution) across all the 50 plaintexts and calculate the Hamming distance of the 50 results (the first for loop below).
To calculate y, now we work with the 50 measurements with 1806 samples each (the second for loop below). This will yield a set of 1806 intermediary correlation coefficients from which we choose the maximum, absolute = final_correlations (the other values are “noise”).
By running the calculateCorrelation function across all possible guesses for a single byte (all bytes) of the secret key we gain information about how each individual guess correlates (what is its relationship and to what level) to the 50 measurements.
If correlation is found then gathering a large number of traces will enable one to show that the correctly predicted cipher key will demonstrate the highest level of correlation.
Here are three visualizations of the correlation coefficients for correct secret key, the red point with the highest absolute value (closest to 1) corresponds to the correct byte of the secret key.
The solution “bruteforces” the correct bytes of the secret key and their corresponding correlation coefficients.
The complete source code to the solution can be found below.