Log In  

Hi there,
I'm looking for a way to optimize a bit my script.
I have a projected shadow on some of my sprite that is pretty dumb: I read every pixel of the sprite and print a pixel of shadow according to a given offset.
It implies that I ready every pixel of every sprite in need of a shadow.
Some optimization is needed because when I try to display a shadow for a huge set of sprite, the frame rate drops.

I was wondering if using Peek and Poke would be faster than Pget and Pset?
Any idea if it would speed up the thing?

Thanks.

P#33724 2016-12-19 16:09 ( Edited 2017-01-26 00:05)

Yes, and using memset(...) on entire rows would probably be even faster. But either way you have to make sure that the rows of your projected shadow contain an even number of pixels aligned on an even boundary (x%2==0), since each byte of memory covers two pixels: (p0<<4) + p1. If not, you'd have to clean up the ends of each row with a special pset() operation.

But the BEST way to do this is probably to use spr() and pal(): pal() to shift the sprite colors to your shadow color, then spr() to blit an offset copy of the sprite where the shadow should be. Then you draw the original sprite on top of it.

P#33725 2016-12-19 16:30 ( Edited 2016-12-19 21:36)

Thanks for the answer. I was thinking about using pal() but... I use a color ramp to fake shadow. Depending on the "ground" color I put a different color for each shadow pixel.
Something like in this example.

I will then try memcpy the pixel set to user data, modify it, and memcpy it back to screen.

P#33727 2016-12-19 16:45 ( Edited 2016-12-19 21:45)

Ah, I see—yeah, in that case, you'd have to use the memcpy method you propose. The other way you could go—if you're running at 60fps and don't mind a little flicker—is to blit the shadow every other frame so that it appears to blend into the background.

P#33729 2016-12-19 18:19 ( Edited 2016-12-19 23:20)

Oh. Smart. Wouldn't it also work at 30fps?
(the memcpy is not event needed as I can compute everything directly onscreen)

P#33730 2016-12-19 18:24 ( Edited 2016-12-19 23:26)

You could try it! At 30fps the flicker may be more apparent and distracting. OR, if you're locked to 30, you could try using two shadows with an alternating dither (i.e. every other pixel transparent) and just flip between them.

P#33731 2016-12-19 18:32 ( Edited 2016-12-19 23:37)

Well, with the poke and peek method frame rate is better but not THAT better :(

P#33732 2016-12-19 18:54 ( Edited 2016-12-19 23:54)

30fps alternating scanline technique:

Cart #33733 | 2016-12-20 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA

and 60fps alternating frame technique:

Cart #33734 | 2016-12-20 | Code ▽ | Embed ▽ | License: CC4-BY-NC-SA

P#33735 2016-12-19 19:59 ( Edited 2016-12-20 01:00)

Another idea just occurred to me for 30fps shadows (and then I'm going to stop procrastinating) -- precompute 4 or 5 randomly dithered shadow frames and write them to the spritesheet, then cycle between them every frame, like so:

P#33739 2016-12-19 20:39 ( Edited 2016-12-20 01:39)

Couple of things:

  • Are you calculating the shadow for every pixel in the sprite (including the shadow pixels that get end up covered by the sprite), or just the visible shadow edges?
  • Are you recalculating these shadow masks every frame, or are you caching them in a table first?
P#33740 2016-12-19 20:53 ( Edited 2016-12-20 02:04)

Thanks for your replies guys (and musurca for your examples and ideas).
Here is what I have so far.

The effect is fine, but, at some point with the rest of the gigantic map, the frame rate drops a little bit at the end of the map (I need a way to optimize the map loading / rendering too but it'll be in another post).

To answer your questions catatafish :
- Are you calculating the shadow for every pixel in the sprite (including the shadow pixels that get end up covered by the sprite), or just the visible shadow edges?

I still compute the shadow for every pixel in the sprite (even the one that'll be hidden by the sprite) BUT I compute shadows only for relevant sprites (border ones).

- Are you recalculating these shadow masks every frame, or are you caching them in a table first?

I recalculate them every frame... simply because as the background moves (and the shadow caster moves too) chances are that the color will change every frame too :S no ?

Thanks for your help.

P#33752 2016-12-20 05:37 ( Edited 2016-12-20 10:37)

Another suggestion if you are really gung ho about the look up table colors. Since you say this:

"I read every pixel of the sprite and print a pixel of shadow according to a given offset."

Looking at your image above, there are relatively few shadow pixels in it. For instance, you only have to output shadow pixels if the pixel in the original sprite is transparent. You also don't have to loop over the entire sprite. With some hand coded rects, you could cut down on a lot of reads. Easily 75% of them.

P#33795 2016-12-20 19:03 ( Edited 2016-12-21 00:03)

Experimenting with another idea to memcpy() the framebuffer into sprite memory and use pal() to perform the LUT. Should allow it to work in far fewer function calls. So far that part works, I just need to figure out how to apply the sprite's mask to the blitted chunk of the framebuffer. I think I should be able to do more palette fiddling and an extra blit though.

The benefit of doing it this way should be many fewer function calls. Most of the cost is per chunk copied, and not per pixel.

P#33796 2016-12-20 20:21 ( Edited 2016-12-21 01:21)

Kinda mostly working with masking!

P#33798 2016-12-20 22:03 ( Edited 2016-12-21 03:03)

Ok! Seems to work well. Probably not going to finish it enough to really test the performance well though...

-- copy a 32x16 rect
function blitshadow(dst, src)
    for i = 0, 15 do
        local offset = 64*i
        memcpy(dst + offset, src + offset, 16)
    end
end

-- load a lut draws a sprite as black
function masklut()
    pal()
    for i = 0, 15 do
        pal(i, 0)
    end
end

-- the lut used by shadowlut()
lut = {
    0, 0, 1, 0,
    2, 0, 5, 6,
    4, 4, 9, 3,
    1, 5, 8, 6,
}

-- load the shadow lut
function shadowlut()
    pal()
    for i = 0, 15 do
        pal(i, lut[i + 1])
    end
end

-- apply a shadow to a 32x16 chunk of the screen
function shadowtile(x, y, maskf)
    local px, py = x*32, y*16
    clip(px, py, 32, 16)

    -- blit a clean copy to the sprite sheet
    local src = 0x6000 + px/2 + py*64
    blitshadow(2096, src)

    -- mask out the shadowed pixels and copy again
    masklut()
    maskf()
    blitshadow(3120, src)

    -- reset the camera, blit in screen space
    camera()

    -- draw the tile back to the screen with the lut applied
    shadowlut()
    spr(76, px, py, 4, 2)

    -- now draw the original pixels back with the shadowed parts masked out
    pal()
    spr(108, px, py, 4, 2)
end

local cx, cy = 0, 0
function _draw()
    cls(12)
    clip()
    pal()

    camera(cx, cy)

    -- draw something that you want to be shadowed...
    mapdraw(0, 0, 0, 0, 64, 64)

    local t = time()
    local sx, sy = 8 + 8*sin(t), 8 + 8*cos(t)

    -- this part could be vastly improved
    -- currently applies the shadowing to all pixels
    -- could either render many sprites in the mask function
    -- or render them one at a time, and only shadow the 32x16 tiles they overlap
    for j = 0, 7 do
        for i = 0, 3 do
            shadowtile(i, j, function()
                spr(0, sx + 4, sy + 4, 8, 8)
            end)

            -- need to reset the camera after calling shadowtile()
            camera(cx, cy)
        end
    end

    -- need to reset clip after calling shadowtile()
    clip()

    -- finally draw the sprite as usual
    spr(0, sx, sy, 8, 8)
end

Not sure how one submits a cart to an existing thread...

P#33799 2016-12-20 22:40 ( Edited 2016-12-21 03:41)

@lvictorino That approach makes sense.

To elaborate on what I was getting at - if you know which pixels of the sprite are casting visible shadows you can cache them, e.g. as relative coordinates, to avoid having to re-scan the sprite pixels every frame. Checking visibility is obviously more expensive than your current method but as it's a precalculation step it won't matter in-game.

This means the number of peek & poke calls you need to make (or whichever method you choose) is reduced to a minimum, and you're not spending cpu on pixels that are never going to be seen.

edit: @slembecke - nice work! I think you can just use 'save @clip' and paste it directly into the post.

P#33800 2016-12-20 22:58 ( Edited 2016-12-21 04:01)
:: Stompy

This is kind of unrelated to your question, but while we're all focusing on shadows, it seems you aircraft has a light source in the top left of the screen, while the ground map has a light source coming from the bottom left.

P#33801 2016-12-20 23:55 ( Edited 2016-12-21 04:55)

@slembecke : whoa nice work! I never used clip() (I don't even understand what it's for) but I'll look into it. Thanks.

@Catatafish : Ok, you're right. I don't know exactly why I haven't thought about caching visible shadow position before. But it's an awesome tricky, very easy to implement. Thanks a lot, I'll try that.

@Stompy : You're right, but the effect is really weird when the shadow is set to reflect a bottom left light :S

P#33804 2016-12-21 02:02 ( Edited 2016-12-21 07:02)

I'm trying to do something similar as what's being discussed here.

You can see an example of what I'm trying to do here:
press x to toggle the light off/on to see the difference in performance when walking
My example project

So I went with trasevol_dog's approach based on the following example. I had to make a bunch of tweaks to work in my project but it doesn't run very well and i've had a hard time getting the perf and memory usage to improve. [trasevol_dog's example]( https://www.lexaloffle.com/bbs/?pid=34140#p34161)

Now I'm hoping to try this approach instead as it sounds like it'll be a lot better performance-wise. I really have no idea where to start with the bit blit stuff and copying memory around. Does anyone know of any resources I could look into for more explanation into how to do this? I tried following the code above but couldn't get very far with understanding how it works enough to adapt into my own project.

Thanks!

P#36679 2017-01-25 19:05 ( Edited 2017-01-26 00:08)

[Please log in to post a comment]

Follow Lexaloffle:        
Generated 2020-07-15 05:02 | 0.098s | 2097k | Q:85