31 August 2020

RSA For The Mathematically Challenged

by Blake Eakin

RSA For the Mathematically Challenged

This week in the P(CC)^2 CTF we retire a network traffic analysis problem and an RSA cryptography challenge. Both of these topics give ample opportunities for discussion, but I've chosen to give an explanation of approaches to RSA challenges since they are fairly common, but also can be a problem for those without much experience with it. RSA challenges usually signal the cutoff point between challenges featuring simple, unpractical encryption methods and serious methods more akin to what would be implemented in real life. A big barrier for some people in approaching RSA challenges is that most available explanations for how the encryption scheme works come across as mathematically convoluted. This makes it difficult to understand what the data you are given in a challenge actually is or what one should do with it. I would like to try offering up an approach that avoids a lot of the mathematical jargon and cuts to the heart of RSA challenges, but also gives proper explanations for what is necessary to understand the process.

What is RSA?

The first idea to wrap your head around with RSA is that it is a form of asymmetric encryption. This means that there are two different keys relevant to communicating with the scheme: a public key and a private key. The public key is used to encrypt information being passed to the private key holder, but it can't decrypt that information, while the private key can decrypt it. Probably one of the most common places you see RSA in action is with keyed ssh authentications. You generate a private and public key, pass the public key to the ssh server to store, and keep your private key super secure. To authenticate, your ssh client takes your private key and some session data that was created while establishing the connection, and creates a signature that will be passed to the server. The server will test it by creating the same signature with its available public key. If it checks out then you pass that portion of authentication. You can read more about it in RFC 4254, section 7 if interested, but it's not necessarily relevant to completing challenges.

It is worth it to mention that while RSA encryption with sufficient key size is regarded as highly secure, it is also fairly slow to encrypt/decrypt compared to a lot of symmetric encryption schemes. Because of this RSA is mainly used to encrypt small amounts of data, typically symmetric encryption keys so that they may be securely passed around and then that key is used to decrypt larger data.

How does it work?

Primes

RSA encryption depends on prime numbers - numbers which can only be divided by one and themselves without leaving a remainder - and if you have ever wondered why finding big prime numbers is important, it is mostly because of these sorts of encryption schemes. Why prime numbers? For one, their lack of factors makes it so that the primes used to generate a key are the only numbers that can generate it. If the numbers used had a lot of factors then that would increase the amount of numbers that could generate the same key. Secondly, prime numbers are hard to find. There is no efficient method of, say, getting the next prime in a series without already just having a table of primes. There are a few ways of making educated guesses, but it mostly comes down to brute force.

For the purposes of RSA two prime numbers need to be randomly chosen. They can be any two prime numbers, but there are some ideal qualifications for them. It's best if they are not commonly known prime numbers, also they should be similar in size while still being a few digits off one another. It can be trivial to crack a key that just uses 3 and 7, and the difficulty of finding two sufficiently large, obscure prime numbers that are off by a few digits can be exponential. After getting two sufficient primes it is extremely important that they remain secret since they form the basis for building a key. Everything else can be easily derived given an RSA key's primes. This is why generating proper primes is usually left up to the program making your keys and not the user.

Key Generation

Here is where things start to get hairy. One of the confounding factors of RSA challenges is that you are usually just given some numbers associated with some letters that don't necessarily provide you any information as to what they are. On top of that, most explanations of the key generation process don't do much to clear that up if you're not already familiar with the base mathematical concepts. The first number generated after picking primes in the process is n. Simply enough, n is the result of multiplying the two prime numbers together, or n = pq where p and q are your two primes. This is information that's available with the public key, and the length of n's bit representation is the key length. You may think that since you're given a number that's the product of the two primes then figuring out what those primes are should be pretty simple, but that's why you want larger key lengths. If you have a 4096-bit key length, then the size of your n can be as large as 24096, which can be prohibitively large for a system to factor.

Next is usually the first road block in understanding the key generation, and that's finding λ(n). λ represents Carmichael's totient function, and that's all a lot of confusing words and symbols that are honestly irrelevant to our purposes. We will leave it at λ(n) can be found by subtracting one from both of your primes and finding their least common multiple ( λ(n) = lcm(p-1, q-1) ). This number also needs to be kept secret, but is usually just discarded after making keys anyways. Typically you aren't concerned with working backwards to find λ(n), so don't worry too much about it.

Another number you'll usually find in an RSA challenge is e, which is also included in a public key. e can be any range of integers from 1 to λ(n), non-inclusive, that is also coprime with λ(n). Coprime just means that the only positive integer that divides two integers is 1, or more simply the greatest common divisor of the two numbers is 1. For example, 15 and 8 are coprime because the only number that can divide evenly into both numbers is 1. Again, the process for finding a suitable e is done to generate RSA keys, and isn't something you need to understand for cracking RSA. Generally, the e will be provided to you since it is a public key component, and you only need it for decrypting purposes after breaking down the n.

The final number to be produced during key generation is d, which is known as the private key exponent. d is calculated from e and λ(n) through the formula d = (1 mod λ(n))/e. If you're not familiar, mod stands for the modulus operation, which provides you with only the remainder of a division between two numbers. For example 7 mod 5 = 2. However, this is not as straight forward of a formula at first glance. What you need to calculate for d is the modular multiplicative inverse. Once more there is no reason to worry however because we can just use online tools to calculate this for us without even needing to understand what a modular multiplicative inverse is. Those tools will be included in the upcoming sections.

Encryption and Decryption

That gives us all the components of public and private keys. Now, how are all of those numbers used to encrypt something like a message made up of text? Encryption and decryption actually use fairly simple formulas that work on whatever byte level representations of the data you're working with. Encryption uses the formula me = c mod n where e and n are provided by the public key, m stands for the original message (what you're trying to crack in a challenge), and c is the resulting ciphertext (which is provided in a challenge). Decryption occurs through the formula m = cd mod n. Next we will look at what you are even supposed to do with that.

Breaking Bad...RSA Encryption

This finally leads us to the question of how are we supposed to use these various formulas and bits of data to break RSA encryption from public key data? RSA's strength, like pretty much all forms of security, depends on how well it is implemented, and solving a CTF challenge involving it depends on picking up the weaknesses in the implementation from what is provided from the public key.

Breaking Simple Cases By Hand

There are two situations in which breaking RSA is nearly trivially easy. The first of which is when e=1. If you recall that the formula for encryption is me = c mod n, so if e=1 then you can simply state the formula as m = c mod n since any number raised to the power of 1 is equal to itself. And then you have all of the information you need to compute m by just plugging in c and n.

A little more difficult, but also about as simple as it gets, is when the public key consists of a weak n. It is easy to find <a href="https://primes.utm.edu/lists/small/10000.txt' target="_blank">lists of primes</a> up to a certain length, and being able to reduce the list of possible primes down to a handful makes factoring n relatively easy to brute force. It's also easy to catch when a really weak prime has been used. For instance, 2, 3 and 5 are all prime and have simple divisibility tests which can be used to either know immediately that one them is a factor or rule out a greater span of primes. Take this example:

    
        n = 8989
        e = 13
        c = 4854
    

First we want to get a list of all primes less than 8989, which gives us a list of about 1116 primes. It is not even, it does not end in a 5, and the sum of its digits are not divisible by 3, so we can rule out those three as well as any primes larger than 8989/5 since we know any prime above that would have to be multiplied against one of these smaller, ruled-out primes to possibly result in 8989. This widdles our list of possible primes down to 275, a major improvement. Then in any scripting language we can load that list of primes into an array and check it against our n. Here is an example in Python with the full array declaration omitted to preserve space:

    
        primes = [7,..., 1789]
        n = 8989

        for prime in primes:
            if n % prime == 0 and (n / prime) in primes:
                print(prime, str(n/prime))
                break
    

Which provides us with the results of p=89, q=101. Remember that the decryption formula is cd=m mod n so we need to use the primes we found to calculate d from d=e-1 mod λ(n). Here is a quick breakdown of the whole calculation:

    
        λ(n) = lcm(p-1, q-1)
        λ(n) = lcm(89-1, 101-1)
        λ(n) = 2200
        
        d = e-1 mod λ(n)
        d = 13-1 mod 2200
        d = 677

        m = cdmod n
        m = 4854677 mod 8989
        m = 65
    

Just a couple of notes for those following along at home. First, you may be wondering how in the world I got 677 for d. What you are actually computing there is called the modular multiplicative inverse. I could spend a lot of time trying to explain what that is and how you calculate it, but for solving these challenges it's easiest to just use a calculator, input 13 for "Integer a" and 2200 for "Modulus m" and you'll get the proper answer. Next, you might be looking at our resultant message of 65 and trying to figure out what you are supposed to do with that exactly. Keep in mind that this encryption is just mathematical formulas and does not concern itself with what the information actually represents. What you will end up with is a decimal representation of your data that you then need to convert into whatever sort of encoding makes sense for you purpose. With this example it's simple enough to convert this decimal representation in an ascii encoded character 'A', which was the message I originally encrypted. It's not much, but RSA can only encrypt data that is less than its n value. If I wanted to encrypt more data it would have to have been broken down into chunks. So, if you end up with a long decimal value as a result for m then try converting it into hex and encoding it byte by byte.

Tools For More Complicated Public Keys

Hopefully I did not leave you scratching your head with the last section, but if you still do not think you can do these challenges then just stick with me for a second because it's about to get a lot simpler because I am going to introduce some tools that will make it so you don't ever have to think about modular arithmetic, or ds and λs to break some keys.

My previous explanations will get you through the easiest RSA challenges, but they can ramp up pretty quickly from there and make those methods absolutely worthless. After all, what do you do when the n is much, much larger? I mean, I did say there's no reliably efficient way of generating primes, and I did not lie. Thankfully people for decades now have been doing all the hard work for us and now there are databases full of prime factorizations! One of the better sites available for this purpose is factordb.com. There's even a python library to use it on the command line or in scripts. Now that is pretty sick, but you still do not want to do any of the math after finding the prime either? Well, lucky for us again there is a command line tool to pull all of this together into one nice package: RsaCtfTool. Now, at this point you may be seething with anger, just wanting to shake me and yell "Blake, why did you waste all of my time explaining how to do everything before this if there's a tool that could do it all for me?!" Well, sorry for trying to teach you something. Alright, let's get into a more complicated example for RsaCtfTool to help us through.

    
        n = 85459589
        e = 17
        c = 53064722
    
    
        $ RsaCtfTool.py -n 8545989 -e 17 --uncipher 53064722 --attack factordb
        [...]
        Unciphered data :
        b'\x00flg'
    

You can disregard the null byte at the beginning of the byte string and you can see that they even convert to ascii for you to give you the message of 'flg'. You can leave off the --attack option and it will cycle through all of its many available attack types until one of them pops out a proper result. It is a versatile tool so be sure to check out the available options.

Practical Application

There is so much more that could be talked about with regards to breaking RSA, but a lot of the more advanced topics really build off of understanding the underlying math and mostly require specific situations to work. With what we have covered so far though you will be able to solve a good chunk of the RSA-related CTF challenges. There still might be one thing sitting uneasy with you though, and it might be that you are not sure how all of this fits in with how RSA is actually used. You may have had a bit of experience generating and using RSA keys and never once see anything like ns and es and whatnot. Your keys mostly just look like a bunch of base64 encoded data that when decoded is even more nonsense. I am gonna take this last section to quickly explain how what we have gone over ties in to how applications of RSA work with keys.

Now, it is true that the RSA keys you work with are all encoded in base64, or to be more precise in PEM format. PEM stands for Privacy Enhanced Mail, a standard established in the early '90s for protecting data sent by email that is mostly not used except for its key encoding method where it sandwiches the binary data of a key between a header and a footer. For RSA, the binary data that is PEM encoded is a serialized represenation of an ASN.1 data structure. ASN.1, or Abstract Syntax Notation One, is a sort of language developed specifically for describing custom data structures as binary data and transmit between processes or networks. Think of it like a binary JSON. You do not need to know too much about how ASN.1 works, but if you want to play around with a public key there are several online tools for decoding PEM into ASN.1. Here is one that I like to use that I loaded up with an example RSA public key I randomly grabbed off of Google. It is a 1024-bit key, and the relevant bits of data are in the third nested sequence element. You can see the first number is a 1024-bit integer, which will be the n, followed by the e of 65537, which is a very common number used in RSA for e.

Conclusion

That should be plenty of information to get you started playing around with RSA in challenges or in general. There is a lot deeper to go with it, and the math involved is really interesting for the adventurous among us. Hopefully I could help you out without too much trouble, but if you have more questions do not hesitate to hit me up on the discord.

Challenges For Practice

tags: capture the flag - crypto - cryptography - rsa