Side channels and PlaidCTF 2013

Posted on 23.04.2013

There are already blog posts on this out there, however, I still feel a sort of duty to write up the things I learned for PlaidCTF's compression crypto challenge.

The challenge itself was presented as a service which takes our input, prepends it to some unknown plaintext, compresses this then encrypts it with properly initialized AES in CTR mode, using 64-bit prefix and 64-bit counter. This is as per NIST's recommendations for CTR mode, so as far as we know cannot currently be attacked directly. However, the compression step provides a side channel.

To understand how side channels in compression work, you need to understand how compression works. A naive explanation is to say that compression forms a dictionary and counts repeats of letters, combinations of letters etc. The reality is a little more complicated, of course, however, that is the essential bare bones of what a compression algorithm does.

Therefore, if I can prepend some text to an unknown string, I should be able to observe the resultant lengths and deduce whether my string had any impact on the size. If it increases it, chances are the text is not already present inside the original - however, if not then it may be that the text is present.

To see this in action, run this little python script:

import zlib
len(zlib.compress(""side channel"".encode(""utf-8"")))
len(zlib.compress(""ABCDside channel"".encode(""utf-8"")))
len(zlib.compress(""sideside channel"".encode(""utf-8"")))

You should see the third line indicates an improvement in compression over the one before it, and the first acts as a baseline. This allows us to deduce that ""side"" is in ""side channel"" if we did not already know this.

The script I wrote post-Plaid was to see if automated guessing (via permutations) could deduce a string. This works by trying a repeated prefix for a given string, then working out if any one of the possible prefixes produces better encryption than any of the others. In the event of a tie, permutations of two possible characters are tried.

This script succeeds for a number of test cases. However, for many cases it does not, and the script loops indefinitely. In a number of cases, partial matches are achieved. Remember, whilst it would be simple to simply look at the zlib dictionary, we are (for the purposes of a side channel) assuming we do not have access to the compressed data.

#!/usr/bin/env python3
import zlib
from itertools import *

string=""the_ctf_games""

compressed = zlib.compress(string.encode(""utf-8""))
compressed_len = len(compressed)
print(""Compressed length is "", compressed_len)

found_chars = []
char_array = [chr(ord('a') + i) for i in range(0,26)] + ['_']

for j in range(1, len(string)+1):
    copies = 1
    current_cr = 0
    chosen_char = '\0'
    multiples_of = len(string)+1
    increase_copies = False

    while True:

        prefix = ''.join([c for c in found_chars]) + '#'
        prefix_rep = ''.join([prefix for i in range(1, int(multiples_of/len(prefix)) )])
        prefixed_string = prefix_rep + string
        current_cr = len(zlib.compress(prefixed_string.encode(""utf-8"")))
        if increase_copies:
            copies += 1
            print(""Copies now "", copies)

        for char in map(lambda x: ''.join(x), permutations(char_array, copies)):
            prefix = ''.join([c for c in found_chars]) + char
            if int(multiples_of/len(prefix)) < 1:
                multiples_of += len(string)
            prefix_rep = ''.join([prefix for i in range(1, int(multiples_of/len(prefix)) )])
            prefixed_string = prefix_rep + string
            prefix_clen = len(zlib.compress(prefixed_string.encode(""utf-8"")))

            print(prefix, prefix_rep, prefix_clen, current_cr, chosen_char)

            if prefix_clen < current_cr:
                chosen_char = char
                current_cr = prefix_clen
                increase_copies = False
            elif prefix_clen == current_cr and current_cr != 0:
                increase_copies = True
            if current_cr == 0:
                current_cr = prefix_clen

        if chosen_char != '\0' and increase_copies == False:
            break
        if increase_copies == False:
            multiples_of = multiples_of + len(string)
            print(multiples_of)

    found_chars.extend(chosen_char)

print(""Found string is: "", ''.join(found_chars) )

Note: you don't need that script to solve the Plaid challenge. I was simply intrigued as to how easy it would be to automatically decode a compressed string using this method. If you play around with the string you will see different results from the algorithm.

Using this knowledge, we should be able to take on the compression challenge. To do this, we needed a script to query the server:

#!/usr/bin/python
import socket
import struct
import select
import zlib
from time import sleep

s = socket.socket(
    socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.settimeout(1)
s.connect((""addr"", 4433))


message_index = 0
messages = [{'challenge':'cri', 'response':''}, 
            {'challenge':'some', 'response':''}]

for m in messages:
    body = m['challenge']
    wire_msg = ''
    wire_msg += struct.pack('I', len(body))
    wire_msg += body
    s.send(wire_msg)

    result = ''

    while True:
        chunk=''
        try:
            chunk = s.recv(1)
        except socket.timeout:
            break
        if chunk == '':
            break
        result += chunk
    m['response'] = result

s.close()

for message in messages:
    print ""Message %s produced response of length %d"" % (message.challenge, len(message.response))

Using this script we are able to choose some queries to send the server. The initial stages were simply guesswork for the plaid example. I ended up deducing ""some"" existed in the plaintext and from there, with a bit more work, ended up with crime_sometimes_pays as the flag.

This technique is employed in a specific attack called CRIME, as detailed on Sec.SE which is an application of compression and information leakage of plaintext, a paper going as far back as 2002 which documents the problems with compression when applied to plaintext before encryption.