Cartridge storage and code compression scheme

asterick • 2015-09-01*2015-09-01 23:48* •

So, I've been spending quite a bit of time these last few days trying to figure out the innerworkings of the PICO-8, especially the cartridge storage and I've basically got everything figured out at this point. One thing I thought might be helpful is letting everyone get a glimpse into the compression format that is used inside of the .PNG format, so you can avoid hitting that pesky 32kB code wall!

So, first I will start by breaking down the storage format for the current version of the .PNG. Note so far this has been a clean room RE, so if I'm duplicating anything that you already know, or has been described for you, just ignore me and move on.

Cartridge data is stored in the 2 lowest bits of a color channel, so each color is 8-bits worth of data (1 byte). The channels are ordered (MSB to LSB) ARGB (this is not raw uint8 data order from the image container, so keep that in mind). The images are 160*205, giving you a theoretical size of 32800, although only 32769 bytes are used, Everything past this is discarded.

0x0000~0x42FF:  Ram initalizer (simply copied to RAM on load, see manual memory map for details)
0x4300~0x7FFF:  Code (version 0 is ASCII encoded, version 1 is 
0x8000:       File version

This is all fairly straight forward and should not surprise anyone. If the version is 1, it means the code is likely compressed and will follow the following format:

0x0~0x3: the string ":c:\x00"
0x4~0x5: length of code (decomressed, big-endian)
0x6~0x7: always zero (could be used for compression scheme in the future?)
0x8+:    Compressed data

Compressed data is an LZ like format, supporting a minimized character set so some text takes less space. It takes 8-bit codes to generate a chain of characters, this loops until the output data matches the value located in the header.

The codes are as follows

0x00 xx: "Extended" code.  The following byte is copied to the output stream
0x01: Emit a new line
0x02~0x3B: Maps to: " 0123456789abcdefghijklmnopqrstuvwxyz!#%(){}[]<>+=/*:;.,~_"
0x3C~0xFF xx: Copy from buffer

The 'copy' call copies N bytes from -X bytes from the end of the current stream.

offset = (code - 0x3C) * 16 + (code2 & 0xF);
length = (code2 >> 4) + 2;

For a working example, this is what "Hello hello" would be encoded as.

// Header
"3a", "63", "3a", "0", "0", "b", "0", "0", 
// Encoded text
"0", "48",  // "H"
"11", "18",  "18",  "1b", // "ello"
"2", // [space]
"14", // "h"
"3c", "26" // "ello", 4 bytes copied from 6 bytes ago

Note: There are no capital letters in the short form encoding, so don't ever use them in your program. Also the maximum number of characters that can be copied from a previous point in the stream is 17 characters, so try to keep your variable names below this threshold. Interestingly enough, minus signs are also not supported by the base character set.

impbox • 2015-09-02*2015-09-02 00:54*

ooh thanks so much! I was stuck on decoding the code data! \o/

Scaevolus • 2015-09-03*2015-09-03 04:00*

Good work!

One tip for people implementing this: remember that a copy can have a length longer than an offset for RLE, for example something like '{},{},{},{},{},{},{},' might compress to include a copy(offset=3, length=18) directive.

I analyzed the carts in the 'cartridge' section. Using zlib to compress the sprites/maps/sound and the code together would give plenty of headroom: cart size analysis tables

pico-8 already includes DEFLATE compression routines (lodezlib) as part of its PNG encoding.

asterick • 2015-09-03*2015-09-03 13:42*

@Scaevolus: I'm assuming the decompression routine just sets a pointer to the output buffer and then does a memcpy. This is fairly common with LZ routines.

And yes, I agree that it would make sense to just dump the custom compression and just compress map + tile data whole hog.

Also: Edited the doc to give the right channel order

AfBu • 2015-09-03*2015-09-03 18:39*

Damn good work, thank you! Was wondering how data is saved.

dddaaannn • 2015-10-15*2015-10-15 18:21*

Recent carts have a file version of 5. I don't see any differences in the code compression part. Anyone notice other differences?

dddaaannn • 2016-10-19*2016-10-19 01:37*

For the archaeologists finding this thread looking for the code compression and decompression routines, the picotool library has open source reference implementations in Python: https://github.com/dansanderson/picotool

[Please log in to post a comment]

About | Contact | Updates | Terms of Use | Picotron

Follow Lexaloffle:

Generated 2025-07-15 19:54:27 | 0.008s | Q:20

User:
Password: