Fading Stars Demo
Demo showing the combination of the stars and fade effect I used in PICO Space: https://www.lexaloffle.com/bbs/?tid=42279.
[UPDATE 2021-4-12: 8bit and 16bit cached modes (explanation below and in code), some other small tweaks]
The stars are just simple particles that have x,y,z coordinates.
In this demo I use a couple of sin functions to give them some movement combined with a divide by the z coord for a bit of parallax. In game, I feed in the player's position.
Then I clamp the resulting x,y values to the screen with the modulus operator so they're always visible (%128). It does mean that the same stars go past constantly, but otherwise I was processing a lot of particles that don't get seen very often (not aiming for realism here).
This works by mapping the colour of every pixel on the screen to another colour that tends to a target e.g. 0/black.
You can use a similar mapping with the pal(x,1) function to e.g. do fades to black between screens etc. but that fades everything including anything drawn that frame.
In this demo I process the pixels already in screen memory so that the screen is faded by a step, then draw fresh stars on top of that.
It's pretty expensive to do the whole screen (IIRC about 90% of performance at 60fps) so I've set it up to do every fourth scan line, starting from a different point each frame. Effectively a quarter of the screen is faded at a time. It takes 4 frames to fade the whole screen one step.
I initially tried fading in quarter strips top to bottom, but the tearing on bigger objects like planets looked pretty bad.
Using the order 0,2,1,3 for the scanlines does some rough dithering to make the effect look a bit more uniform. A random value flr(rnd(4)) works quite well too, but is messier looking.
Since I found using poke4 to work on 8 pixels at a time was fastest (not surprising really) dithering horizontally is limited and isn't in the demo. Nevertheless, I keep meaning to try a "Z" pattern i.e.
I'm concerned it might cost too much more in performance/tokens for too little visual improvement.
Of course, as soon as I write about it the old subconscious starts working away and it takes 5 minutes to implement just that - a reverse N pattern as it turns out. Same performance, same tokens. See the new cart.
Pros of Effect
- You can draw whatever you want really and the effect essentially "just works" as a replacement for a cls().
- Cross-fading out from a scene just happens "free".
- very simple particles look much more complicated than they really are
- You can't draw anything that moves without the fade effect "catching" it. It can be mitigated by drawing around your objects (e.g. black borders), but if the view moves more than the width of the border you're out of luck.
- Conversely, the effect only works where you don't draw that frame - so if your game has e.g. a full-screen scrolling background that's drawn every frame then you won't see any effect at all. For a space game this isn't a huge problem, but it's still visible here and there.
- If nothing moves then there's no effect - try hacking the stars to be still in the demo.
- Performance cost is approx 21% at 60fps.
- Obv costs some tokens.
The effect works fine by extracting each pixel's colour value via shifting and masking then dumping the mapped values back onto the screen, but it's still pretty performance heavy.
When I was writing PICO Space I'd read a few times that procedurally generated content used a lot of memory so I didn't want to try anything like the following, but now I have a much better idea of the game's memory requirements I thought I'd give it a go.
Pixels in PICO-8's screen are determined by a 4-bit value, but peeking and poking only works with 8-bit granularity at best i.e. a pair of pixels or more at once. The mappings I have contain 16 values for each possible colour of a pixel.
Considering pairs of pixels instead of single pixels, there are 16 * 16 = 256 possible combination of colours that need to be mapped. Why not store a table with each of these values - it can't be that large, right?
Turns out it isn't, especially when compared to the 2MB of space lua is given in PICO-8. In fact the demo seems to only use about 2K or so (which is still a lot more than the 256 bytes it should take, but still pretty small).
This means that a lot of masking and shifting isn't as necessary inside the inner loop. It even takes fewer tokens. The performance improvement is enough that half or even all of the screen being processed per frame isn't too bad.
The next step was obviously to try mapping 4 pixels at a time using 16-bit values.
This would need a table of 16^4 = 65536 entries which isn't very big for a modern machine, but is pushing it pretty far for PICO-8. It's possible - take a look at the code. It also takes up a lot more memory: about 1200KB it seems. That's well over half of the total space available and for my purposes in PICO Space is enough to give me sporadic out of memory errors as it stands (PICO Space takes about 600-900KB depending on the size of the current galaxy and how much is going on in it at any particular moment). For other games it may be absolutely fine and it's tempting since there's about a 2x speed-up compared to my original implementation of the effect using this technique.
A bit too far
PICO-8's number format is 16bit.16bit fixed point so every value I've been storing so far is actually 32 bits in size whether I use all of those bits or not. Why not use them all?
Storing mappings for 8 pixels isn't going to work: 16^8 = 4,294,967,296 - a bit too much for PICO-8.
Instead, the last implementation that I've tried (so far) stores two 16-bit values in each number in the cache table so that the same amount of mapping values as in the previous section takes half the entries and hence half the space. The upper 16 bits take the even values; lower 16 bits the odd values.
This brings the memory usage down to about 600KB or so, which is fairly reasonable.
Unfortunately, the two mapping values packed into a single PICO-8 table value need to be unpacked to be used in the inner loop of the effect. By the time shifts and masks are applied to do this I couldn't get the performance to really be any better than the original effect (without any caching of values), never mind faster than the 8 bit or 16 bit table. So using this would be spending about 600K for no real benefit.
If you have any other ideas or know of optimisations I've missed then I'd love to hear about it below.
[Please log in to post a comment]