Character set for base64 encoding

In fact, let's look at it in a table format to make things easier: So even though the logical grouping of a byte is 8 bits, we're going to modify the groupings to be 6 bits (to reflect how Base64 sees this): 111111 110011 001000 110011 000010 10. The way Base64 works is to interpret the bits in groups of 6. Let's start by examining the Base64 table, which looks very similar to the ASCII table:įile3.txt's binary representation is 11111111 00110010 00110011 00001010. However, as I mentioned it only uses 6 bits. In contrast, Base64 uses the entire byte as data. Those bytes don't hold any information about the character being stored (i.e., the actual data).

UTF-8 uses the leading bits in a byte as metadata to determine whether it's a starting byte or a continuation byte. That means it only needs 6 bits to represent the whole alphabet (2 6 = 64). But how did we end up with /zIzCg=? I'll take this one step at a time to avoid confusion.īase64 has 64 characters in its alphabet. Let's show this on the command line with a new file, file3.txt:Īnd now the file has printable characters: For example, UTF-8 doesn't understand 11111111. A byte that doesn't start with 0, 10, 110, 1110, or 11110 wouldn't be rendered properly by UTF-8. This means that byte sequences that don't follow this pattern are incomprehensible to UTF-8. And it uses 10 to indicate a byte is a continuation byte. 0 for 1 byte, 110 for 2 bytes, 1110 for 3 bytes, and 11110 for 4 bytes. I described in the UTF-8 section in part 1 how certain bit patterns at the start of a byte indicate how many bytes the character will be. That is something UTF-8 cannot accomplish. Also, you are guaranteed to always have characters that can be displayed, no matter what the underlying bits are. It doesn't have to be limited to a file either it can be just a string, such as a password. The benefits of this are that you can output the contents of any type of file, no matter what data it contains. Base64 is often used to translate a binary file to text, or even a text file with non-printable characters to one with only printable characters. It doesn't contain characters like NUL or EOF (which are examples of non-printable characters). It is a subset of ASCII, containing 64 of the 128 ASCII characters: a-z, A-Z, 0-9, +, and /. In fact, it's pretty much the only one in use, much like UTF-8 is for character encodings on the web. The exact same process is needed to transport that speech over the Internet.īase64 is an example of a binary-to-text encoding. In the same way, say you want to upload the speech to the cloud. ASCII is needed to translate those bits back into the words, letters, and punctuation that make up the speech. The computer stores that speech as a bunch of 1s and 0s. You want to save it on your computer so you don't have to re-type it every time. For example, say you're writing a speech. A character encoding like ASCII is really good for data storage and transmission. Wait, what? That was a nebulous distinction you say? Okay, let me try to explain it in a different way.

Binary-to-text encodings are designed to turn bits into human-printable output. However, character encodings are designed to produce human-readable output. What's the difference? Both character encodings and binary-to-text encodings share the same goal of turning bits into characters. Base64 is an example of a binary-to-text encoding. ASCII and UTF-8 are examples of character encodings. I'll start by going over the types of encoding.Īs best as I can tell there are 2 different categories for encoding: character encodings and binary-to-text encodings.

This statement deals with several different concepts. Our message is safe because it's encoded using Base64 In part 2 we'll address the remaining ways "encoding" could be used: Let's write the output to a UTF-8 encoded file In part 1 we demystified the following ways the term "encoding" is used: