Hi, I am new to Pico-8 and I love it.
I yesterday tried to benchmark filling every pixel individually. At first I used pset. I tried something very simple like pset(x,y,c) where c = x + y + nframe, where nframe += 1 every update. I got 15fps, then I got 30 when I put it inside, like pset(x,y,x+y+nframe)
Then I tried to see if pset per pixel matters. I tried to poke at 0x6000, of course it's faster because I was poking two pixels at once. But if one needs to still control individual pixel, has to do bit manipulation with band and mul by 16, so it will be slow again. It would work for copying screen areas with memcpy, like a game I had seen that makes cool glitches.
Then I tried to make a buffer
vram = {}
for i=1,8192 do
vram[i] = fill with stuff..
end
but memcpy(0x6000,vram,8192) did nothing (black screen)
is there a symbol to get the pointer of vram? I also tried vram[0], vram[1] at the function.
Now if I copy some of the contents at 0x3400 and then do memcpy(0x6000,0x3400 it works, my array has stuff.
Do I need a special symbol before vram, like &vram? I couldn't find for sure.
So, at the end my question is, is it worth it to have a separate buffer (and does it fit in memory?) of 8k or even 16k chunky pixels, and copy from there to vram? Or is pset the only I need for speed? Afterall, I am not sure if pset was slowing down, or simply that even c = x+y+nframe took time from 30fps to 15fps.
The other thing is to just reduce resolution. from128128 plasma to 6464 blocky (either with 4 pset or with rect function) I got 30fps from 15fps. Well, 15fps is still not bad for full pixel effect in this tiny machine, I just want to investigate if it's possible to improve performance in pixel effects or I search in vain.
Any other techniques that can improve performance in general? I did my plasma test with initing precalced sines, although I haven't tested if the sin functions are slow or not. I was searching for a thread to say common pitfalls in Pico-8 performance, what to avoid, what to prefer.
Also, I guess there is no emulated CPU. But is there a restriction on speed, like counting imaginary cycles for every token and have a maximum theoritical cycles per frame? Or does Pico-8 runs the fastest possible, and subsequent version of Pico-8 might be more optimized and change speed in some of the cartidges? So, maybe some effect I code now, will have faster speed in the future because of improvements or faster PC processor? How does this affect when the cartidges run on browser?
From what I know, there is no way to keep 30fps while filling every screen pixel separately. You will have to group some pixel together, and use native functions like rectfill or circfill as it's realy fast. In general, using localvariables is faster than global or than variable member's
What you say about being faster once inside the pset is interesting though, I never tested that.
A separate buffer can fit in memory in two part (see the cart "Shodo" for example) but it will probably not improve you framerate. It can be usefull to not draw everything each frame while still drawing 30fps dynamic elements above it.
I think there is a 64x64 native mode that has been "found" (search screen mode hack in the bbs), so it's probably faster that drawing 2x2 rects.
The cpu limitation is, as far as I know, arbitrary. Each piece of code probably as an arbitrary cost linked, and when the sum go beyond 100%, the program slow himself. In the webplayer, it's often slow even bellow the limit, probably because of javascript.
ncolor =0
local idx = 0x6000
for y=0 to 127 do
for x=0 to 127 do
poke(idx, (x+1)16+x+y17+ncolor)
end
end
Wow, the special modes hack is nice. I am wondering if it was intended.
I will also try to get the Shodo, I get it why he needs backbuffer. Now I need to figure out how to download the cartidges and load in my computer. I didn't see any download link there, but I heard something about cartidges data hidden in PNG image, and I see I can download the cartidge image.
That code doesn't do exactly what I want.
One would have to do this to be complete, else the values are overflowing outside of 4bits and it's getting wrong, when for example I add colors for plasma. Unless I want to do a noisy dirty plasma.
poke(idx, band(x+1+y+ncolor,15) * 16 + band(x+y+ncolor,15))
Actually, poking to vram would still work on some effects, like texture mapping, since one can store bitmaps in 4bits format so they don't need the band. Or make a 64*128 effect with wide pixels and forget about combining pixels all along.
Ok, you are right asterick, thanks, it seem I didnt fully tested poke. Here is some code I made from your sample :
local idx = 0x6000
for y=0,127 do
for x=0,63 do
local p1 = (y + x2)%16
local p2 = (y + x2+1)%16
local val = p1 + p216
local pos = idx+x+y64
poke(pos,val)
end
end
It work at around 80% CPU, witch is more than a pset version (around 110%)
Using some incremental values can lead to around 50% CPU :
local cp = 0x6000
local vv = 0
for y=0,127 do
for x=0,63 do
local p1 = vv%16
local p2 = (vv+1)%16
local val = p1 + p2*16
poke(cp,val)
cp += 1
vv += 2
end
vv += 1
end
So contrary to my belief, its quite possible to do that. Know doing something beautifull with that is a chalenge.
Heh, totally forgot about the modulo. I used to optimize tiling with and 2^n-1 numbers, but now it's opposite, module is faster than the band call of course.
Thanks guys.
I also didn't know the local keyword.
And just found the stat(1) function. I was using my own fps counter before, but it was always 30 or 15 of course. Now I can make more sense.
p.s. I noticed the % not always working in another code I am just working, not sure if it's a bug or something I am doing wrong. I will check later again.
EDIT: Ok, I realized with % not working. Numbers are fixed point 16:16, I modulo on some math values. I should flr. Although now I lose accuracy so I'll revert. It's actually cool numbers are fixed point.
[Please log in to post a comment]