Python already includes a base64 library as part of the standard library.  So why write our own?

I can offer two reasons.  Writing your own will solidify your understanding of the base64 algorithm.  Also, if you write your own, you can add in extra functionality that might not exist in the standard library version.

Believe it or not, this post is actually part of the RSA series.  As per the requirements found in RFC 1421, PEM encoded keys need to be base64 encoded.  More accurately, their ANS.1 DER encoded objects need to be base64 encoded.  I’ll make a post about ANS.1 DER encoding at some point in the future.

For now, let’s look at base64 encoding.  It’s a pretty simple system and offers the ability to turn any data into printable data.  This is to say that, no matter what input you use, the encoded output will consist solely of the following 65 printable characters:

'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/',
'='

Some of you may be wondering why it is called base64 instead of base65 seeing as how there are a total of 65 possible characters.  This is because the ‘=’ character is actually just a padding character and will only ever show up at the end of a base64 string.  The code should help you understand how the padding is determined.

Continue reading “Base64 in python”