Log In  

After discussing base64 encoding in a recent thread, I decided to try my own version of it for storing level strings, but after some experimentation I started wondering how much of a storage benefit it really offers vs. hexidecimal. I also came across an old post from user Overkill from 2016, in which he concludes that unfavorable interaction with Pico-8's built-in compression basically negates base64's byte-per-character advantage.

https://www.lexaloffle.com/bbs/?tid=3738

I decided to perform a test of my own to measure the difference precisely. To do this, I used some web-based tools to generate random strings, using both hexidecimal and my own base64 character set, which uses as many 1-byte characters as possible (59, to be specific -- certain characters like small caps and glyphs use 2 bytes). I generated very large strings and pared them down just enough to fill the compressed data limit to 100.00% using each character set. I performed 3 trials each, recording how many characters would fit for each encoding type, then averaged these amounts for a general comparison. Results are as follows:

Trial 1
Base64: 15846 characters max.
Hex: 23230 characters max.

Trial 2
Base64: 15776 characters max.
Hex: 23261 characters max.

Trial 3
Base64: 15757 characters max.
Hex: 23275 characters max.

Averages
Base64: 15793 characters max. -- 11,845 bytes
Hex: 23255 characters max.-- 11,628 bytes

From these trials, the difference in byte capacity vs. compressed size appears to be about 2%, which suprised me. While there could be cases in which custom encoding offers various benefits, in general I think I'll stick with hexidecimal. For me, a negligible increase in storage efficiency is more than offset by much easier conversion to and from byte values and some degree of readability.

P#75328 2020-04-25 07:59 ( Edited 2020-04-25 08:02)

1

Note that the new PICO-8 0.2.0 compression format no longer favours these 59 characters and is now charset agnostic, so there is no more reason to change the character mapping.

I believe your tests show a few things about the new compression algorithm:

  • like the previous one, it is not efficient enough to nullify the 33% size expansion that base64 encoding induces
  • compression of random hex data has improved; it used to compress about 12% worse than what you measured

It’s mere coincidence that hex and base64 data can now encode roughly the same amount of random information, but it’s interesting nonetheless! However for most purposes it’d be more interesting to test with non-random data (such as images, JSON, or other structured information).

P#75337 2020-04-25 10:25
1

I was wondering if there was a change to the compression algorithm with 0.2.0, because I did some more simplistic testing a while ago and it showed about a 12% advantage to base64, like you said. I guess to get a totally accurate picture it would help to test with specific types of ordered data, but I'm guessing the difference isn't going to be huge in any case.

That's pretty cool, in addition to all the other nice changes in 0.2.0, we get higher-density string storage without the unreadable custom formatting.=)

P#75340 2020-04-25 10:47 ( Edited 2020-04-25 11:06)
1

Another factor to consider in rare cases is a cart whose source compresses well enough that it can get close to the 64k uncompressed source limit. In cases like that, it still might be worthwhile to use base64.

P#75366 2020-04-25 17:46

[Please log in to post a comment]

Follow Lexaloffle:          
Generated 2024-03-28 07:34:29 | 0.005s | Q:14