Log In  

UPDATE!

PX8 has been replaced by PX9: https://www.lexaloffle.com/bbs/?tid=34058

But I'll leave this here for reference, and for existing projects using PX8.


Cart #25919 | 2016-07-26 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA
46

This is a library mostly for compressing graphics and maps, but can also be adapted to compress sfx. It is designed for data-heavy carts and requires around 450 tokens for decompression, although this can be reduced if needed by removing remap(), hard-coding parameters, and/or removing predicted spans at the cost of compression performance. If someone wants a smaller/weaker version, let me know!

To use it, compress a 2D rectangle to an address in memory, and supply a function for fetching the source values (normally SGET or MGET). For map data, you probably want to set p.cbits to something like 4 first.

COMP(0, 0, 128, 64, 0x2000, SGET)

0x2000 is the address of the top half of the map, so this will compress the top half of the sprite sheet (128 sprites) and write the compresed data over the map data.

DECOMP() takes the memory address of the compressed data, the top left corner of where to decompress to, and functions for getting and setting decompressed values. So to decompress this data at 0x2000 back to the screen, starting 32 pixels down:

DECOMP(0x2000, 0, 32, PGET, PSET)

Only the decompression section of the code is needed once you have compressed data stored on a cart. A typical workflow would be to make a utility cartridge that grabs data from multiple source cartridges (using RELOAD() with a 4th parameter to indicate where to read from), and then CSTORE them to a single cartridge (again, using CSTORE()'s 4th external cart parameter).

Here's an example that stores 2-byte lengths at the start of each compressed block, to allow seeking out the start of any given gfx. I've commented the cstore and reload lines so that it will work if you paste it at the end of the main PX8 cart example:

-- px8 workflow example:
-- storing multiple compressed images and fetching them

-- 1. make a utility cart that
-- compresses all the needed
-- data to a single cart

local base_offset = 0x2000
local offset = base_offset

-- compress some gfx and
-- add the length of the compressed
-- data at the start (2 bytes)
function add_gfx(x,y,w,h)

 local len = comp(x,y,w,h,offset+2,sget)
 poke(offset+0,len%256)
 poke(offset+1,len/256) 
 printh("wrote "..offset)
 offset += len+2

end

-- jelpi frames + mushroom
add_gfx(0,24,40,8)
add_gfx(40,0,8,8)

-- could load another cart's
-- spritesheet at any time
-- reload(0,0,0x1000,"blah.p8")

-- vegetation
add_gfx(48,16,32,16)

-- store in target cart
-- cstore(0x2000,0x2000,(offset-base_offset),"out.p8")

----------

-- 2. load the compressed data
-- (from the cart it was
-- compressed to)

base_offset = 0x2000

-- skip through compressed data
-- blocks and load the one at
-- index
function load_gfx(index,x,y)

 local offset=base_offset
 for i=0,index-1 do
  offset += peek(offset+0) + peek(offset+1)*256 + 2
 end

 -- use sget,sset to write back
 -- to the spritesheet instead
 -- of the screen
 decomp(offset+2,x,y,pget,pset)

end

-- test

cls()
load_gfx(0, 20,10)
load_gfx(1, 20,40)
load_gfx(2, 80,10)

The Algorithm


I started working on PX8 while working on PICO-8's specs, as an important question was how large a game could theoretically be for hard-core users who want to go to the trouble of compressing stuff. It became something of a brainworm, and releasing PX8 is a way to get this out of my system. I hope it is also useful to someone, or at least interesting.

PX8 is (AFAIK) a novel algorithm that appears to work well for typical PICO-8 data, and out-performs pngcrush -brute for the few 16-colour images I tested. Alternating spans of predicted and non-predicted values are stored:

  1. Predicted values (colours) are calculated by maintaing a table of the last encountered matching neighbours: if a match top & left is found, that is taken to be the prediction. Otherwise, a match for only top, and then only left are considered. Failing those, the value is taken to be non-predicted.

  2. Non-predicted values are stored as indexes into CLIST; a list of literal values that are stored in the order they were last encountered. This means that recently encountered values have smaller indexes, and the encoding exploits this, along with the fact that each index can not possibly be for the failed prediction (in which case it would be part of a predicted span).

Each span is strictly made of either all predicted or all non-predicted values. This means that for predicted spans, no additional information needs to be stored except the length of the span itself. And conversely for unpredicted values, the index into CLIST is known to not point at the predicted value. This means that no colour index data needs to be stored for 2 colour images at all, and the compressed data is composed entirely of span lengths.

P#25922 2016-07-26 12:59 ( Edited 2019-04-26 18:43)

Mindblowing! THX a lot!

Glad you killed that brainworm finally. :D

Svnt

P#25939 2016-07-26 18:52 ( Edited 2016-07-26 22:52)

This is way past cool!
By chance, could there be a version of this that loads 4-color sprites (3 + transparency), or even 2-color sprites?
(I'm not really good with coding, especially compression and what-not.)

P#25947 2016-07-26 21:55 ( Edited 2016-07-27 01:55)

I was wondering if it could be modified to store maps that only use X number of tiles so that if you're not using the entire sprite space your compression could be much smaller. It would be great for a puzzle game that only uses a few tiles.

P#25948 2016-07-26 23:58 ( Edited 2016-07-27 03:58)

Does the prediction method outstrip doing copies from historic data by much?

I sometimes think there must be a good way to do compression with recursive markov chains that are based on previously-decompressed data. I'd need an optimal way of stringing together chains based on prediction and correction-for-misprediction, and also a way of inserting novel data. Yours is very much like this idea, except it uses fixed-length chains (3 pixels) without recursion and only builds with prediction and novel data. I guess novel data could direct the next prediction, so maybe that would stand in for correcting misprediction, hmm.

P#25958 2016-07-27 04:05 ( Edited 2016-07-27 08:37)

@Skyrunner65, @Connorses

PX8 should do a decent job of compressing low-colour images or maps with few tiles, at any bit-depth. I get around 70% compression for 4-colour sprites and 80% for sparse maps like Jelpi's. i.e. you could fit around 3x as many 4-colour sprites or 5x as much map data.

@Felice
It doesn't do much better than LZ-style compression at all! Around 5~10% better for images with little repetition, and (not surprisingly) performs worse when there is something like many similar frames of animation in a spritesheet. The LZ code I tried has room for improvement too, and is potentially cheaper tokens wise. I'd like to revisit it later and make something that can be swapped out for PX8 with the same comp/decomp interface. The only real downside is compression speed, but that's not such a big deal if it's only needed when generating the release version of a cart.

I'd love to see what happens with better predictive models like MC / RMC, although it would of course put strain on the token count and memory consumption (I forgot to mention, PX8 uses half the Lua RAM (512k) on decompress in the worst case!). The other challenge as you mentioned is figuring out how to efficiently encode the predictions and mispredictions -- the current PX8 span-based method is a rather clunky way of exploiting structure in the sequence of predictions.

Argh, compression is strangely addictive.

P#26058 2016-07-28 23:00 ( Edited 2016-07-29 03:37)

It is, isn't it? :D

Generic compression is kind of a dry subject, actually, but when you start analyzing actual, specific data sets to find certain exploitable, predictable features, that makes it fun.

By the way, I was thinking about this just yesterday and I remembered that a friend of mine did a similar up/left history+prediction to compress maps in a snes game. I was busy with compressing single sprites at the time (a very unrewarding endeavor), so I didn't ever get more than a bird's eye view of the map compression algorithm, and I can't even remember that view of it now. I really wish I had the source code for that game. :/ So much of what we did would be useful again in PICO-8.

P#26078 2016-07-29 07:48 ( Edited 2016-07-29 11:50)

This is really astounding. Now I put together a compressor recently, about 25-lines of code, it's great for compressing audio - but it PALES in comparison to your mighty one that CRUNCHES data very hard.

Started reading the source. Hoo boy ! Some real long-hair stuff. Both out of my range of expertise and understanding !

P#30596 2016-10-10 21:12 ( Edited 2016-10-11 01:12)

Good job. This works out much better than your RLE compression test, as a more complete solution. I was making my own compressor/decompressor along those lines at some point, but I ... got distracted. I'm a cat, after all.

P#30598 2016-10-10 21:29 ( Edited 2016-10-11 01:29)

has anyone tested this with maps?
It seemed to work at my first test, but if I try to copress my levels I get only corrupted data...

reload(0,0,0x3100,"level_1_1.p8")

------------- compression ---------------

clen = comp(0,0,128,32,0x6000,mget)
decomp(0x6000,0,0, pget, mset)
------------------------------------------

cstore( 0,0,0x3100)

without comp/decomp, the data from "level1.p8" is correctly loaded^^
Here is "level_1_1.p8":

Cart #37235 | 2017-02-07 | Code ▽ | Embed ▽ | No License

edit: Ok, the API makes absolutely no sense to me... how is this supposed to be used, when not working with images on the DisplayBuffer? Especially with own xset / xset functions, seems that xset will be called with negative values?!

This one destroys the palette :\

function _set(x,y,val)
  return poke(0x4300 + y*128+x,val)
end 

cls()
w=128 
h=30 
bpp=8
rawlen = (w*h)*(bpp/8)

reload(0,0,0x3100,"mario_l3 - Kopie.p8")

-- compression --
clen = comp(0,0,w,h,0x6000,mget)
decomp(0x6000,0,32, pget, _set)
for iy =0, h do 
  for ix = 0, w do 
      mset(ix,iy,peek(0x4300 + iy*128+ix))
  end 
end 
cstore( 0,0,0x3100)

why do the functions not work lineary on the memory? Just like the 'compression test' -thing

comp(source_addr, destination_addr, length) 
decomp(source_addr, destination_addr, length) 
P#37232 2017-02-06 19:49 ( Edited 2017-02-07 02:42)

This should do the trick:

comp(0,0,64,32,0x6000,mget)
decomp(0x6000,0,0, mget, mset)

edit: <snip> actually, I'm not entirely sure if comp should use 64 as the width (map values = byte, pixel = nybble) or not - seems to work ok with 128. I suggest you check that yourself, as you'll be able to spot errors in your map data quicker than I can.

You need to use mget in the decomp function instead of pget. This one took me a while to figure out, but in a nutshell, the algorithm looks at previously decompressed values (to determine whether it can predict the current value or not), which is what that parameter is used for.

Observation while debugging this: take care if you decide to replace pget/mget etc with your own implementation that uses peek/poke, the x,y coordinates will sometimes be out of bounds.

To be fair, px8 could really benefit from having a brief explanation of the various parameters - I'm still not entirely sure what having a cbits value larger than 1 is useful for.

P#37238 2017-02-06 21:53 ( Edited 2017-02-08 02:48)

Thank you very much. Now it works also with custom xget/xset functions:

clen = comp(0,0,tilemap_width, tilemap_height ,0x6000,src_get)

tilemap_width = peek(0x6000)
tilemap_height =  peek(0x6001)
decomp(0x6000,0,0, _mget, _mset)

...

function src_get(x,y)
  if x < 0 or x >= tilemap_width then 
    return 0
  elseif y < 0 or y >= tilemap_height then 
    return 0
  end 
  return tilemap[y+1][x+1]
end 

function _mget(x,y)
  if x < 0 then 
    return 0
  elseif y < 0 then 
    return 0
  end 
  return peek(0x4300 + x + y * tilemap_width)
end 
function _mset(x,y,val)
  poke(0x4300 + x + y * tilemap_width, val)
end 

Results are pretty good, with:

p.cbits  = 4   -- max:7
p.remap = false

Compressed maps are only 358 from 2505 bytes, for example

P#37252 2017-02-07 05:13 ( Edited 2017-02-07 10:27)

Hi all,

I'm loving these PX8 routines (I can't believe I never tried using them before!). But I wonder if I could pick your brains for how I'd prefer to use it - as the way I'm currently implementing it seems "inefficient".

GOAL

I want to store "pages" of graphics (e.g. blocks of 128x64, using the add_gfx() process suggested above) into a single cart's Sprite Sheet memory
(I've already got this part working...)

Then I want to be able to pick, say two "pages" of compressed data and extract them DIRECTLY (or as close to) back into the Sprite Sheet - effectively replacing ALL of the compressed data with uncompressed gfx.
(I'm close to getting this working, by using User Memory to decompress a page/block at a time, then MemCpy'ing back - but it's slower, uses more tokens, and I think flawed as I could overwrite chunks of encoded data that I planned to decompress next).

P.S. When I want to switch gfx data again, I can simply call Reload() to get the uncompressed data back and start again - seems pretty instant.

OPTIONS

OPTION 1:
I know I could to this all to the screen buffer (8k), then MemCopy back in one hit - but I don't want the user to *see* the decompression process. It's a shame User Memory isn't 8k in moments like this.

OPTION 2:
I know I could decompress part of the gfx to another area of memory (e.g. SFX) in addition to User Memory, but it could mean losing up to 4k - which leaves little sfx/music :o/

OPTION 3: ???

Can anyone got any ideas of how best to do this, ideally with as little tokens used? (I'm gonna use a diff. cart for the compression/storage, so it's only the decompression part that needs to be optimal)

For info, the custom functions I'm using with load_gfx/decomp to decompress to Memory (instead of PSET/MSET), is as follows:

function memget(x,y)
 local addr = 0x4300 + flr(y)*64 + x/2
 local pbyte = peek(addr)
 local left = pbyte % 16
 if x%2 == 0 then
   return left
 else
   return (pbyte - left) / 16
 end
end

function memset(x,y,col)
 local addr = 0x4300 + flr(y)*64 + x/2
 local left = peek(addr) % 16
 local right = (peek(addr) - left) / 16
 local val
 if x%2 == 0 then
   val = col+16*right
 else
   val = left+16*col
 end
 poke(addr,val)
end 

Many thanks in advance! :D

P#51996 2018-04-25 08:15 ( Edited 2018-04-25 13:33)

@Liquidream I don't think there's a risk of anyone seeing the use of the screen memory as a scratch buffer. Screen memory is only copied to the display on flip() or the end of _draw(), and this is all single threaded so that won't happen until you're done.

I never realized that calling reload() on the current cart is virtually instantaneous and doesn't have the artificial delay! Seems like you could use all of addressable memory for this. :)

P#52008 2018-04-25 11:27 ( Edited 2018-04-25 15:27)

Thanks Dan,

That's what I originally thought, but I think I was seeing it draw to the screen without having Flips().

I'm wondering if the problem is that, even though I'm not calling one, a Flip() is happening as the Decomp() takes more than a few milliseconds to complete?

Either way, I'd def prefer to use the screen mem for a scratch pad (if I can do it between draws) :D

Yeah, feels kinda sneaky, right?
I was rather chuffed when I found this - let's put it that way! :D

P#52009 2018-04-25 12:30 ( Edited 2018-04-25 16:31)

If you take more than a certain amount of time to call flip(), it seems to occur on its own. This is why tweetcarts can be written without using flip().

Personally, I'd reload() the compressed sprite ram data at 0x4300, since you're allowed to set a different destination address in ram than the source address in rom, and decompress it directly from there into sprite ram.

If it's quick enough and you need 0x4300 for other stuff, use the screen as the reload() destination, but as I said, beware taking too many cycles or it'll auto-flip to the front.

If decompress time is too long to use vram without the auto-flip, maybe break your data up into smaller blocks, so that you can decompress a block this frame, then draw a loading screen or your game screen on top of the scratch data, flip, and move to the next block. You lose a little compression by breaking into smaller blocks, but everything has its cost.

P#52010 2018-04-25 12:39 ( Edited 2018-04-25 16:39)

It's okay - I'm a bloody idiot
...I wasn't declaring a "_draw()"
#facepalm

It was @Felice's comment about tweetcarts that made me think/remember that the rules are different without the main functions present. (I'd just build on the PX8 sample above, for testing - which didn't have them either).

I'm happy to say that it now works!
And what's even better is, I can now save loads of tokens using the original PSET/PGET method - as seems that's also fast enough! :D
#WinnerWinner

Thanks all ;o)

P#52016 2018-04-25 15:19 ( Edited 2018-04-25 19:21)

It sounds like you have the answers already but FWIW, here's a token-optimized* version of decomp that supports writing to (and reading from) any location in memory.

*= at face value the whole thing uses a few more tokens than the original, but if you need to call decomp more than once in your code you'll start saving tokens (as well as having generic pixel read/writes). As an added bonus you can use the single-number-parameter-as-string optimization, e.g. decomp"384" to save another token per call.

It comes with a few caveats and gotchas though.

  • It's been lifted out of a project I'm working on, and in its current state it probably won't work if it's dropped into the px8 cart for testing without a few alterations.

  • It's not optimized for speed, though I doubt it makes too much difference compared to the original. (especially if you do it at 'load time'. readpixel() is ok for light use at runtime as well, but it became a tradeoff between speed and tokens)

  • The x offset parameter is not supported. The images & data I'm compressing are all 128 x whatever -sized so I didn't really have a use for it.

  • The py parameter is a little weird & I keep confusing myself with it, but is essentially a y coordinate starting from the top left of sprite memory and going up (well, down) from there, as if you were using sset/sget to read all of memory. Alternatively it's an address in memory divided by the width of the screen in bytes, i.e. addr/64, with the restriction that the address must be a multiple of 64. For example to write to the start of screen memory (0x6000), you would use decomp(384), writing to the start of audio memory would be decomp(196) and so on. I'm not sure x offsets are supported properly, so fractional values probably won't work as expected.

  • I moved the getval function out into readbits() as I found it useful for other things as well. Some of the variables have been renamed to suit my purposes (readhead==src, databit==bit, databyte==byte). It also reads from the top of cart memory to the bottom, so the math is inverted, but it's otherwise equivalent to (and probably should be replaced with) the original getval().

code:

function readpixel(x,y)
 local byte=peek(x/2+y*64)
 return band(0xf,x%2==0 and byte or byte/16)
end

function readbits(bits)
 local val=0
 for i=0,bits-1 do
  if databit==256 then
   readhead-=1
   databit,databyte=1,peek(readhead)
  end
  if (band(databyte,databit)>0) val+=2^i
  databit*=2
 end
 return val
end

function px8decomp(py)
 local w,h,cbits,rmp,maxci,bpp,clist,pn,i,span,x,y,col,pc=readbits"8",readbits"8",readbits"3",readbits"1",readbits"8",readbits"3"+1,{},{},0,0
 for i=0,maxci do
  clist[i]=readbits(bpp)
 end
 while i<w*h do
  local bl=0
  while readbits"1"==0 do
   bl+=1
  end
  for j=0,shl(min(1,bl),bl)+readbits(max(1,bl)) do
   local idiv64,wdiv8=i/64,w/8
   local i1=rmp==1 and flr(idiv64%wdiv8)*8+i%8+(flr(idiv64/wdiv8)*8+flr(i%64/8))*w or i
   x,y=i1%w,py+flr(i1/w)
   local t,l=y!=0+py and readpixel(x,y-1)/16 or 0,x!=0 and readpixel(x-1,y)*16 or 0
   pc=pn[t+l] or pn[t] or pn[l]
   if span%2==0 then
    local index=0
    repeat
     v=readbits(cbits)
     index+=v
    until v<2^cbits-1
    local pindex=999
    for i=0,maxci do
     if (pc==clist[i]) pindex=i
    end
    if (pindex<=index) index+=1
    col=clist[index]
    for i=index,1,0xffff do
     clist[i]=clist[i-1]
    end
    clist[0]=col
   else
    col=pc
   end
   local paddr,x2=x/2+y*64,x%2==0
   poke(paddr,bor(band(0x0f,x2 and col or peek(paddr)),band(0xf0,x2 and peek(paddr) or col*16)))
   pn[t],pn[l],pn[t+l]=col,col,col
   i+=1
  end
  span+=1
 end
 databit=256
end

NB 'setpixel()' has been inlined in decomp() as I don't use it anywhere else in my code - you may want to change that or use a faster version.

I use both option 1 & 2 in my project in pretty much the same ways as you described:

For #1, I hide things offscreen by switching to the 64x64 screen mode and then decompressing to & copying from the bottom (hidden) portion of the screen to sprite memory, before switching back to 128x128. You could also use the top left corner of a 128x128 image as a cheap splashscreen :)

For #2, I store a 128x128 image starting at address 0x3f00 (or lower if you need to use save data) and accepted that I'd have fewer sfx slots to work with. You still get ~44 though IIRC so you're unlikely to blow that budget for all but the most complex audio. Though you do have to be careful about which slots are used, or write some tools to pack & shuffle sfx around.

Hope that helps.

edit: If anyone can squeeze it further I'd love to know. I can see one more token up for grabs in there now I've read my own post back, so it's probably not that optimized after all.

P#52033 2018-04-25 21:59 ( Edited 2018-04-26 04:00)

@Catatafish: Thanks for this - will keep it in mind for the future

For now, I'm prob gonna try to squish/refactor the Decomp & Load_Gfx functions into one, with as few tokens as poss. But that's for another day! :o)

P#52065 2018-04-27 08:51 ( Edited 2018-04-27 12:53)

Hum...I am clearly doing it wrong but I don't get how such a simple test can be failing in such glory!
Top 64x64: sample image
Bottom 64x64: decompressed output :/

Cart #55478 | 2018-08-21 | Code ▽ | Embed ▽ | No License

Interesting part:

 -- sample image
 circfill(32,32,16,7)

 -- compress to gfx
 comp(0,0,64,64,0x0,pget)

 -- display back from gfx
 decomp(0x0,0,64,sget,pset)
P#55479 2018-08-21 16:43 ( Edited 2018-08-21 20:44)

Replace sget with pget & it should give you the correct results.

P#55485 2018-08-21 20:08 ( Edited 2018-08-22 00:08)

^^^ Wot @Catatafish said! ^^^
(Just tried it with your code snippet)

This seemed counterintuitive to me too, but (I think) the reason you use the paired get/set for the target decompression, is because it uses this to "read" back as it goes - part of the predictive nature of the routine?

I dunno - I'm already outta my depth! :D

P#55493 2018-08-22 01:18 ( Edited 2018-08-22 05:18)

Arhg! thanks.

P#55494 2018-08-22 01:25 ( Edited 2018-08-22 05:26)

(Apologies in advance for unrelated post)

This reminds me of working with SmileBasic; it could only save images so I jiggered some level data into being converted to an image specifically for saving...anyway cool stuff!

P#58743 2018-11-03 19:13 ( Edited 2018-11-03 23:13)

UPDATE!

PX8 has been replaced by PX9: https://www.lexaloffle.com/bbs/?tid=34058

If you're starting a new project, you probably just want to use PX9 instead. Although they use the same interface, I've given them separate names and threads to avoid confusion.

P#63990 2019-04-26 18:44

[Please log in to post a comment]

Follow Lexaloffle:          
Generated 2024-03-19 10:28:49 | 0.050s | Q:59