As part of the ongoing RSA series, I will be walking you through the basics of ANS.1 objects and their role in RSA. 

You will be using this online RSA key generator to generate keys so that you can follow along.

Upon visiting the online tool, you  should be greeted with the ability to choose a key length and generate an RSA key pair.  The output of such action will populate the public and private key boxes underneath.  This output should be recognizable as base64 encoded data.  If you followed along with my base64 article, you should have a tool that allows you to decode this data.  Otherwise, you will need to use an online tool like this.

Here’s an example:

Public Key: 
MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAJgTm6UbtISP6oy4P4sU/PfSI6+E9VJn
LXIGCZlDDhgYHMOlUAp/PEpbjyLxq2/dSUaqXle70/edfh9i2XnE/vMCAwEAAQ==

Private Key:
MIIBVAIBADANBgkqhkiG9w0BAQEFAASCAT4wggE6AgEAAkEAmBObpRu0hI/qjLg/i
xT899Ijr4T1UmctcgYJmUMOGBgcw6VQCn88SluPIvGrb91JRqpeV7vT951+H2LZec
T+8wIDAQABAkBLhHEl7DwYF99BQb1MM3/rEE7oOf4YjWPj21uo38N/8rSJtkcG+1J
Yhq+u/KAtTxtf/HQDmFGheOAuNSQ5fisBAiEA2GTNlFiZfhbRBAdUrlUd1LpWmMhH
B6anfsGEElatRmUCIQCz6TQ8GBPcQG8Kj07WAnZjT/qBV/sMzjw28PWKGwTOdwIhA
NYCg6rWIR+pkxfH5EDx3ynXC/PYBx+S+44J9wNoA8BdAiBE5gg1A1uHu71Ko/sjBi
pkehqLMjBYhRqWR80gqJw8nQIgLL3uyV+nSiGgiozH7OYj427w7gG9ea2vJFRkgbV
QhaA=

Running each of these through a base64 decoder yields:

Public Key:
305c300d06092a864886f70d0101010500034b0030480241009
8139ba51bb4848fea8cb83f8b14fcf7d223af84f552672d7206
0999430e18181cc3a5500a7f3c4a5b8f22f1ab6fdd4946aa5e5
7bbd3f79d7e1f62d979c4fef30203010001

Private Key:
30820154020100300d06092a864886f70d01010105000482013e
3082013a02010002410098139ba51bb4848fea8cb83f8b14fcf7
d223af84f552672d72060999430e18181cc3a5500a7f3c4a5b8f
22f1ab6fdd4946aa5e57bbd3f79d7e1f62d979c4fef302030100
0102404b847125ec3c1817df4141bd4c337feb104ee839fe188d
63e3db5ba8dfc37ff2b489b64706fb525886afaefca02d4f1b5f
fc74039851a178e02e3524397e2b01022100d864cd9458997e16
d1040754ae551dd4ba5698c84707a6a77ec1841256ad46650221
00b3e9343c1813dc406f0a8f4ed60276634ffa8157fb0cce3c36
f0f58a1b04ce77022100d60283aad6211fa99317c7e440f1df29
d70bf3d8071f92fb8e09f7036803c05d022044e60835035b87bb
bd4aa3fb23062a647a1a8b323058851a9647cd20a89c3c9d0220
2cbdeec95fa74a21a08a8cc7ece623e36ef0ee01bd79adaf2454
6481b55085a0

If you are wondering what it is that you are looking at, this article is for you.

Congratulations, you found a DER encoded ANS.1 object

DER stands for Distinguished Encoding Rules.  It is how ANS.1 data is serialized and relies on the Tag, Length, Value (often abbreviated as TLV) format for encoding.

Here’s a break down of the Public Key:

Tag: 30 (Sequence), Length: 5c == 92, Data: 92 bytes to follow
30 5c {
    
    Tag: 30 (Sequence), Length: 0d == 13, Data: 13 bytes to follow
    30 0d {
    
        Tag: 06 (OID), Length: 09, Data: 1 2 840 113549 1 1 1
        06 09 2a864886f70d010101
    
        Tag: 05 (Null), Length: 00, Data: None
        05 00
    }
    
    Tag: 03 (Bit String), Length: 4b == 75, Data: 75 bytes to follow, 0 bits unused
    03 4b 00 {
    
        Tag: 30 (Sequence), Length: 48 == 72, Data: 72 bytes to follow
        30 48 {
         
            Tag: 02 (integer), Length: 41 == 65, Data: **
            02 41 0098139ba51bb4848fea8cb83f8b14fcf7d223af84f552672d72060999430e18181cc3a5500a7f3c4a5b8f22f1ab6fdd4946aa5e57bbd3f79d7e1f62d979c4fef3
         
            Tag: 02 (Integer), Length: 03, Data: 65537
            02 03 010001
        }
    }
}

** Formatting issues got in the way of properly displaying the data value in the code block.  The decimal value of the data is: 7964897496159126758881741521670894000374251523300551501644790124527572547876244077990060430797205222302690103081717589738822230766500351365623910741901043

As you can see, the first byte represents the tag indicating the type of data to follow.  The next byte gives the length (in bytes) of data to follow.  The next n number of bytes (as indicated by the given length) represent the actual data.

The OID tag let’s the parser know what type of ANS.1 object it is looking at.  The OID: 1 2 840 113549 1 1 1 corresponds to an RSA key.  Outside of that, there are a handful of nested data structures that ultimately contain nothing more than two integers.  The first is the value of n and the second is the value of e.

The same process can be done with the example private key:

30 82 0154 {
    02 01 00
    30 0d {
        06 09 2a864886f70d010101
        05 00
    }
    04 82 013e {
        30 82 013a {
            02 01 00
            02 41 0098139ba51bb4848fea8cb83f8b14fcf7d223af84f552672d72060999430e18181cc3a5500a7f3c4a5b8f22f1ab6fdd4946aa5e57bbd3f79d7e1f62d979c4fef3
            02 03 010001
            02 40 4b847125ec3c1817df4141bd4c337feb104ee839fe188d63e3db5ba8dfc37ff2b489b64706fb525886afaefca02d4f1b5ffc74039851a178e02e3524397e2b01
            02 21 00d864cd9458997e16d1040754ae551dd4ba5698c84707a6a77ec1841256ad4665
            02 21 00b3e9343c1813dc406f0a8f4ed60276634ffa8157fb0cce3c36f0f58a1b04ce77
            02 21 00d60283aad6211fa99317c7e440f1df29d70bf3d8071f92fb8e09f7036803c05d
            02 20 44e60835035b87bbbd4aa3fb23062a647a1a8b323058851a9647cd20a89c3c9d
            02 20 2cbdeec95fa74a21a08a8cc7ece623e36ef0ee01bd79adaf24546481b55085a0
        }
    }
}

As you can see, this object is structured very similarly to the other.  Both have the same object ID, which makes sense since they are both RSA keys.  The big difference here is that, while the public key only held two integers, the private hey holds nine.  Part 2 of my RSA series explains why.

A scrupulous observer may have noticed the byte 0x82 found between the tag byte and the length bytes for any tag that has a multi-byte length.  Good eye!  I’ll get into the exact details of how this works as part of the coding section.

So, let’s get coding!

So far, we’ve been looking at a DER encoded ANS.1 object.  While it is tempting to jump right into decoding these objects, I think that showing how they are encoded is actually a better place to start.  That way, you’ll have a better understanding of how length is determined and you will be set to understand why I decode the length the way I do.

As always, I’ll give the basic structure of the object and then I’ll fill it in piece by piece.

class DER:
    def __init__(self):
        self.data = ""
        
    def get_data(self):
        return self.data
    
    def calculate_length(self, data):
        pass
    
    def tag_integer(self, integer):
        pass
    
    def tag_bitstring(self, bitstring):
        pass

    def tag_octetstring(self, octetstring):
        pass

    def tag_sequence(self, sequence):
        pass
    
    def encode(self, data, encoding):
        pass

As you can see from the method names, there aren’t very many different tags we need to work with for RSA keys.  While it is possible to utilize the many other tags available to extend this object with the ability to pack and unpack other types of ANS.1 objects, my only concern is RSA keys.

As such, part of encoding a key will include hard coding the OID accordingly.

def hard_code_rsa_oid(self):
    return "300d06092a864886f70d0101010500" + self.data

def encode(self, data, encoding):
    for integer in data:
        self.data += self.tag_integer(integer)
    self.data = self.tag_sequence(self.data)
    if encoding == "PKCS#8":
        self.data = self.tag_octetstring(self.data)
    if encoding == "X.509":
        self.data = self.tag_bitstring(self.data)
    self.data = self.hard_code_rsa_oid()
    if encoding == "PKCS#8":
        self.data = "020100" + self.data
    self.data = self.tag_sequence(self.data)

As you can see, I start with the inner most data: the integers that, combined, represent a specific RSA key.  I then successively encapsulate that data with the appropriate tags to match the examples.

One thing to note is the encoding parameter.  Looking at the code, you should be able to tell that there are two options: PKCS#8 and X.509.  These correspond directly to private and public keys respectively.

Now that the encode method outlines the encapsulation I am hoping for, it is time to fill in the specifics on building out each tag.

def tag_bitstring(self, bitstring):
    tag = "03"
    bitstring = "00" + bitstring
    bit_len = self.calculate_length(bitstring)
    return tag + bit_len + bitstring

def tag_octetstring(self, octetstring):
    tag = "04"
    oct_len = self.calculate_length(octetstring)
    return tag + oct_len + octetstring

def tag_sequence(self, sequence):
    tag = "30"
    seq_len = self.calculate_length(sequence)
    return tag + seq_len + sequence

As you can see, all three of the encapsulating methods work the same way.  tag_bitstring is a touch different because it adds an additional byte to the data used to indicate how many bits should be ignored from the end of the data.  In my case, I’m not setting anything up to have unused bits so I just set it to 0.

Now, to build the encapsulated data itself.

def hexify(self, data):
    hexified = hex(data)[2:]
    return "0" + hexified if len(hexified) % 2 else hexified

def tag_integer(self, integer):
    tag = "02"
    hexified = self.hexify(integer)
    if int(hexified[0], 16) & 8:
        hexified = "00" + hexified
    hex_len = self.calculate_length(hexified)

    return tag + hex_len + hexified

Here, you can see that the integer (coming as as a decimal integer) is being converted to a hex string with an even number of characters by way of the hexify method.

Further, tag_integer is using the value of the first nibble (the first four bits of a byte) to determine if the integer’s most significant bit is set to 1, which could change the value of the integer should it be interpreted as a signed, negative integer.  To avoid this, if the integer’s MSB is set to 1, tag_integer appends a 0x00 byte to the integer.

All of the tag_* methods have made reference to a calculate_length method.

def calculate_length(self, data):
    hexified_length = self.hexify(len(data) // 2)
    if len(data) < 128:
        return hexified_length
    return self.hexify(len(hexified_length) // 2 + 128) + hexified_length

Remember that 82 we saw in the private key length byte?  See if you can figure out how it works based on the calculate_length code.

For those who figured it out, well done.  For those who feel like they have an idea, no worries.  I’ll dive into it right now.

A length byte in ANS.1 reserves the MSB as a flag that indicates whether or not the length can be represented with only one bit, or if it needs to use multiple bits to represent the length.  Unfortunately, this means that the first byte is now short the flag bit and can therefor only hold 2^7 – 1 different values.  This equates to anything between 0 and 127 inclusive.  Thus, a length value of anything above 127 means that the MSB will need to be set and the rest of the byte will be used to indicate how many additional bytes will represent the actual length.

So, the value 82 seen in the hex output for the private key indicates that (8) the length is above 127 and (2) it will be encoded into 2 additional bytes.  Those two bytes are then read in as the length of the data.  Looking back at the hex output we started with, you should be able to see this as the case.

Now that you have seen how to encode ANS.1 data, Let’s move to decoding.

def __init__(self):
    self.data = ""
    self.integers = []

--snipped--
    
def get_key_data(self):
    return tuple(self.integers)

def handle_integer(self, integer):
    pass
    
def handle_bitstring(self, bitstring):
    pass

def handle_octetstring(self, octetstring):
    pass

def handle_sequence(self, sequence):
    pass

def decode(self, data, index=0):
    pass

I chose to return the integers as a tuple simply because that’s how I had them coded in my RSA object.  If you look over the code from RSA part 2, you should be able to see that the keys are stored in the object as tuples.  I chose to match that format here.

Let’s look at the actual decode method now.

def get_tlv(self, data, begin):
    pass

def decode(self, data, index=0):
    tag, length, value = self.get_tlv(data, index)

    if tag == "02":
        self.handle_integer(value)
    if tag == "03":
        self.handle_bitstring(value)
    if tag == "04":
        self.handle_octetstring(value)
    if tag == "30":
        self.handle_sequence(value)

    tag_bytes = 2
    hexed_length = hex(length)[2:]
    if len(hexed_length) % 2:
        hexed_length = "0" + hexed_length
    length_bytes = 2 if length < 128 else 2 + len(hexed_length)

    if tag_bytes + length_bytes + len(value) != len(data) - index:
        return self.decode(data[tag_bytes + length_bytes + len(value):])

Because of the nested nature of ANS.1 objects, I opted to utilize recursive function calls to unpack it.  That might not be immediately obvious because of the external calls to other methods, but it should be pretty clear upon inspection of those methods.

def handle_integer(self, integer):
    self.integers.append(int(integer, 16))
    
def handle_bitstring(self, bitstring):
    return self.decode(bitstring, 2)

def handle_octetstring(self, octetstring):
    return self.decode(octetstring)

def handle_sequence(self, sequence):
    if sequence[0:30] == "300d06092a864886f70d0101010500":
        return self.decode(sequence, 30)
    if sequence[0:36] == "020100300d06092a864886f70d0101010500":
        return self.decode(sequence, 36)
    return self.decode(sequence)

Basically, the way I coded it will continue to decapsulate each layer until it gets to the integers, where it will extract them and discard everything else.  I went the lazy route on the OID and chose to completely skip it since I am only trying to handle the one specific ANS.1 objects. This would need to change if you wanted to extend this code to handle more than just RSA keys.

The last piece of the puzzle is a method get_tlv.

def get_tlv(self, data, begin):
    start = begin
    stop = begin + 2
    
    tag = data[start : stop]
    start, stop  = stop, stop + 2
    length = int(data[start : stop], 16)

    if length > 127:
        length_indicator = int(data[start : stop], 16) - 128
        start, stop = stop, stop + length_indicator * 2
        length = int(data[start : stop], 16)
        start, stop = stop, stop + length * 2
        value = data[start : stop]
    else:
        start, stop = stop, stop + length * 2
        value = data[start : stop]
    return (tag, length, value)

As you can see, this splits the data into proper tag, length, and value data and returns the three to be further parsed.

And that should do it.  If you want to see how this is actually used by the RSA object, check out RSA part 3.

References:

https://crypto.stackexchange.com/questions/29115/how-is-oid-2a-86-48-86-f7-0d-parsed-as-1-2-840-113549

https://learn.microsoft.com/en-us/windows/win32/seccertenroll/about-sequence?redirectedfrom=MSDN

https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>