Just a few things I thought I’d share. Sorry if it’s gibberish and/or uninteresting to most people.
Many carts store extra data in the code section, most of the time inside strings. Some people will store structured data such as JSON or JSON-like data and let the PICO-8 compression mechanism deal with it, but you must have seen people who rolled their own compression mechanism and use custom formats such as modified base64 that takes advantage of the 59 “good” characters (i.e. the ones that needed only one byte of storage in the .p8.png format instead of two).
Such base64 data encodes 6 bits of information per character, and requires an average of 8.625 bits per character on the old format cartridge (pre-0.2.0). It means that this “alphabet” gave us about 5.565 bits of information per byte on the cartridge (8 * 6 / 8.625).
The new compression format is now charset-agnostic; there is no reason to use custom encodings and favour those 59 characters. We can therefore use standard base64 as far as only storage is concerned (when thinking about decoder complexity, using a contiguous alphabet would be smart):
This data still encodes 6 bits of information per character, but now requires an average of 8 bits per character on the new format cartridge, giving us exactly 6 bits of information per byte on the cartridge (better than the old compression method).
So I wondered: what about other alphabets? (of more or less than 64 characters)
Here are the theoretical results, assuming perfectly random data and ignoring the fact that the compression algorithm may emit back references:
Here are my conclusions:
- the new compression algorithm always outperforms the old one for random data
- starting from about 40, alphabet size no longer really matters (there are “peaks” around 48 and 112 but probably not worth exploiting)
In a future post I will study the effect of the back reference emission (which are rare on random data but non-negligible with small alphabet sizes), but I’ll wait for 0.2.0e to be out since Zep mentioned on Twitter that he fixed something in the compression.