Base64 encoding

Overkill • 2016-06-30*2016-06-30 20:46* •

Interesting discovery today: Because the latest version of pico8 supports more punctuation characters, it's now possible to encode binary strings as base64!

I prototyped this on pico8 0.1.8. By using this we can slightly improve the size of encoding binary data in text form, it's 1.33333x bigger compared to hex's 2x size. Granted, pico8 does some compression to the code, so not sure if the added entropy of packing things closer together will work in everyone's favor, but this is worth a shot. It won't be standard base64, because there's no distinction between uppercase and lowercase, but there's enough special characters to make up a custom 64-character dictionary for the encoding.

-- Set up the encoding/decoding lookup tables.
b64d={['']=0}b64e={}b64c='0123456789abcdefghijklmnopqrstuvwxyz-_+[]{}|:;=.<>?/~`!@#$%^&*()'
for i=1,#b64c do
local c=sub(b64c,i,i)
b64e[i-1]=c
b64d[c]=i-1
end

-- encodes a table containing 8-bit numbers (binary data) into a base64 string
function encode(t)
local s,e='',b64e
for i=1,flr(#t/3)*3,3 do
s=s..e[flr(t[i]/4)]..e[bor(t[i]%4*16,flr(t[i+1]/16))]..e[bor(t[i+1]%16*4,flr(t[i+2]/64))]..e[t[i+2]%64]
end
if(#t%3==1)s=s..e[flr(t[#t]/4)]..e[t[#t]%4*16]
if(#t%3==2)s=s..e[flr(t[#t-1]/4)]..e[bor(t[#t-1]%4*16,flr(t[#t]/16))]..e[t[#t]%16*4]
return s
end

-- decodes a base64 string into a table of numbers
function decode(s)
local t,d={},b64d
for i=1,#s,4 do
local x,y,z,w=sub(s,i,i),sub(s,i+1,i+1),sub(s,i+2,i+2),sub(s,i+3,i+3)
add(t,bor(d[x]*4,flr(d[y]/16)))
if(#z>0)add(t,bor(d[y]%16*16,flr(d[z]/4)))
if(#w>0)add(t,bor(d[z]%4*64,d[w]))
end
return t
end

-- testing it out
for i in all(decode(encode{1,2,3,4,5}))do
print(i)
end

I plan to try this on a project of mine which has lots of data tables stored in strings. Not sure if it's a huge improvement over hex when compressed yet, but I thought it was worth trying! Feel free to use / adjust for your own projects. Depending on how things are set up, you can remove the encoder entirely and just have the base64 decoder.

base64

Danjen • 2016-06-30*2016-06-30 21:23*

I still prefer hex, because it's more widely used and fits nicely into a byte. Plus, all the headache I'd have to work with a new base, I'd have way more dev time using whatever came naturally to me.

Overkill • 2016-06-30*2016-06-30 22:04*

Yeah that's fair, hex is more widely used and the ease of use is definitely a benefit.

However, hex encoded data will not fit in a single byte, it requires two hex characters to represent an 8-bit number, as each hex digit represents 4 bits. So you pay 2x size increase for this encoding over native binary (and native binary on pico8 is not possible within the code section even if Lua itself allows some unescaped binary characters in strings). base64 on the other hand repesents 3 8-bit numbers for every 4 base64 characters, reducing the encoded data to only a 1.333333333333x size increase.

The idea behind using this is to simplify creating a large table of numbers. Using literal table with numbers eventually grows infeasible because it increases the token count and character count at the same time. Hex is a useful choice, since you can parse, but with sufficiently large data, that 2x overhead really adds up.

But yeah, it all really depends on how much you need it, ultimately.

On the project I'm working on, I already use an external tool for automating the encoding of large binaries (representing animation data, frame data) as hex. And this is a project that is basically starved for extra compressed codesize and needs the extra codesize + tokens to write gameplay code, so this encoding tool seemed like a good idea. But for smaller projects it's definitely a tad overboard, agreed.

Anyway, I'm simply sharing it in case someone does find it handy. After all, the whole point of a community is to share ideas!

electricgryphon • 2016-06-30*2016-06-30 22:25*

Pico8 can differentiate capital letters as well (even if you can't type them in the built-in editor) , so that can give you another 28 values. These could make for good signpost or control characters to indicate where data is starting or stopping.

Overkill • 2016-07-16*2016-07-16 23:32*

Hmm, just as a followup to this, in case anyone was considering base64 encoding for their projects.

After testing, it seems that base64 won't be better than hex on average, because the PICO-8 compression is byte-based, and the data entropy at the byte level will increase a lot if you use base64 due to it fitting 3 bytes of data in 4 characters of code. This means that the compression will detect less matches.

Not sure what encoding p8 uses under the hood, but if I had to guess it's probably the compress function zep posted before, which used raw data, RLE and LZ compression methods. These are all byte-level and will perform badly because the data is encoded before compression and values will no longer be nicely byte-aligned.

In one particular case, I had some animation data that was 1000 characters less in source code when encoded as base64 vs hex. But when I loaded the cart, the compressed code grew by 1000 characters more than the hex version! In other cases the size difference was only a few hundred characters. But overall, pico8 compression seems to perform worse on arbitrary base64 data than hex.

For this reason, I recommend just using hex, which you can decode like this:

function unhex(s)
local t={}
for i=1,#s,2 do
add(t,('0x'..sub(s,i,i+1))+0)
end
return t
end

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-05-11 12:31:14 | 0.006s | Q:15

User:
Password: