Log In  

18

Cart [#40542#] | Copy | Code | 2017-05-13 | Link
18

Here's a demo effect kludged together from some doodles I had been playing with.
I'm quite fond of my new and improved 3D shading. (It's not quite as speedy as the line shading I was using previously, but it gives smoother gradients.)

-Electric Gryphon

P#40543 2017-05-13 15:29

::

To say it in the philosphical words of Ron Simmons: Damn!
This shading rocks beyond limits!

P#40555 2017-05-13 20:50

::

Very nice looking demo!

I like the sketch of Rocket too.

P#40613 2017-05-15 09:15

::

Wow! When's the full demo gonna be released? ;)

P#40630 2017-05-15 14:51

::

@electricgryphon you are awesome :D

P#40651 2017-05-16 09:13

::

Looking through your source for ideas (stole one already), I saw this:

    if(sx%2==1)then
        poke(start_a,bor(band(peek(start_a),0x000f),band(v,0x00f0)))

This is to fill the first pixel if a span starts on an odd pixel. You have sx and y, would it be faster just to use pset(sx,y,v/16)?

Very spiffy work, regardless. :)

P#40688 2017-05-17 02:06

::

Nope, that poke/peek construct is six times faster than pset.

The built-in single-pixel primitives are INCREDIBLY slow due to handling the camera/clip functions.

P#40715 2017-05-18 05:21

::

I know pset is slow, but for a single pixel at the start of a span that is otherwise poke'd in, that is a ton of overhead, especially with the logic calls. Annoyingly, band/bor cost more than a simple op like +-*/ should, since they're implemented as an actual call, so they have call overhead.

P#40718 2017-05-18 07:00

::

Wait...

There's something seriously wrong with cycle counting...

I have a test bench that tells me exactly how many pseudocycles an instruction or bit of code takes to execute.

pset(sx,y,v) takes 5 cycles. I believe that's just the standard overhead for the call itself. The internal workings appear to be given to you free.

poke(start_a,bor(band(peek(start_a),0x000f),band(v,0x00f0))) take 0 cycles.

There's no way that's right, and I know from a lot of testing that it ain't my test bench that's messing up.

The test bench is a simple process: it has a loop with a large number of iterations that takes exactly one second with no inner code. I know that a FOR loop has exactly one cycle of overhead per iteration, so each additional second of runtime indicates one more cycle added inside the loop. Putting pset() inside the loop increases total runtime to 6s, so it's 6-1=5 cycles for pset(). Putting that poke() sequence inside the loop makes it run for exactly one second.

I considered that the lua compiler might be taking loop invariants out of the loop, which would result in a single poke() outside the loop, so I added the code above the poke:

start_a=band(time(),0x1fff)+0x6000

By itself, that code comes in at 11 cycles, meaning the loop now has a 12s overhead.

With the complex poke(), it's still 12s / 11cyc. There's no chance it's being loop-invariant'ed anymore. I know the poke is happening inside the loop since I can see the actual pixels getting written to the screen as the 12s pass. So... yeah, the poke() is basically coming in as free...

WTF?

Why is a poke with all that math inside of it coming in at 0 cycles? Can I put my entire game in the argument to a poke() call and have it run at native speed? :)

P#40720 2017-05-18 07:30

::

Does your test bench measure the stat(1) accounting as well? This seems pretty serious…

P#40728 2017-05-18 14:09

::

Wow…

poke(start_a,bor(band(peek(start_a),0x000f),band(v,0x00f0)))

takes half the stat(1) time as

poke(start_a,1)

when run in a loop. Very strange!

P#40729 2017-05-18 14:14

::

So, it turns out bor, band, peek, poke, and other functions like bnot all have a negative cost. I don't understand why this is, but stepping through with a debugger, these built-in functions all decrease the "CPU use" value in memory by 3 each time they're called. The only reason why you can't decrease the value to zero is the "instruction_limiter" function, which adds 1024 to the "CPU use" value every so often.

Anyway, here's a cart that demonstrates the issue:

Cart [#40733#] | Copy | Code | 2017-05-18 | Link
2

P#40734 2017-05-18 16:30

::

I wonder if zep was trying to compensate for the difference between a regular operator (1 cycle) and the hacked-in 2-arg calls for binary operators (um, 5 cycles, I think?), and either overcompensated, or adjusted global timing at some point and forgot to adjust the compensation.

I could have sworn the binary ops were more expensive until recently, but I may be thinking of the token cost, rather than the cycle cost. Token cost is a lot more apparent, after all.

P#40764 2017-05-19 20:00

::

Cart [#40783#] | Copy | Code | 2017-05-20 | Link
0

This can definitely be abused, but seems to only really work when viewed from the pico-8 executable as opposed to the web browser.

If you press the z button, the frame rate goes up (becoming smooth) and the stat(1) value drops from 2.5 to 0.6. (On the browser, the state value changes, but the frame rate stays choppy.)

if(btn(4))then
        for i=0,5000 do
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        bnot(bnot(bnot(bnot(bnot(bnot(bnot(1)))))))
        end
    end

I was able to get to a negative stat(1) by having 10000 cycles. This clearly means that frames are being sent back from the future. :-)

P#40784 2017-05-20 02:26

Log in to post a comment

user:
password:

New User | Account Help
:: New User
X
About | Contact | Updates | Terms of Use
Follow Lexaloffle:        
Generated 2017-05-30 11:08 | 0.244s | 1572k | Q:37