12

Cart #rld_conway-2 | 2021-09-16 | Code ▽ | License: CC4-BY-NC-SA
12

With the release of pico 0.2.3 the code slowed down a bit and exceeded 100% cpu, so I lowered the maximum fps from 420 to 390.

Controls: change color with left and right, change speed with up and down.
You can change the initial board by changing the spritesheet. Use colors 0 and 7.

The board is stored as a bitmap, 1 bit per cell at address 0x4300.

Updating the board:
32 cells are processed in parallel using bitwise operations.
The bits are added together using the following functions:

 function add2(a, b) sum = a ^^ b carry = a & b return sum, carry end function add3(a, b, c) tmp = a ^^ b sum = tmp ^^ c carry = a & b | tmp & c return sum, carry end

Let's use the following map to refer to the neighboring cells:

 abc -- above d.e -- current fgh -- below

We get the sum of b+g using add2(above, below).
We get the sum of a+d+f and the sum of c+e+h by using add3(above, current, below). To access the sum of a+d+f or c+e+h we shift the result left or right by 1.
Now we have the 3 sums as 2 digit binary numbers:

 sum bit0 bit1 b+g: s0 c0 a+d+f: s1 c1 c+e+h: s2 c2

We add these numbers one column at a time to get bit0 and bit1 of the final sum:

 bit0, c = add3(s0, s1, s2) sum, carry = add3(c0, c1, c2) bit1 = sum ^^ c

A cell is alive in the next generation if:

1. it has 3 neighbors, or
2. it's alive now and has 2 neighbors.

In the 1st case: 3=11 in binary so the sum is 3 if bit0=1 and bit1=1 and carry=0. The formula is bit0 & bit1 & ~carry
Similarly in the 2nd case the formula is: current_cell & ~bit0 & bit1 & ~carry
The result is: bit0 & bit1 & ~carry | current_cell & ~bit0 & bit1 & ~carry
This can be simplified to (bit0 | current_cell & ~bit0) & bit1 & ~carry, which is equal to (bit0 | current_cell) & bit1 & ~carry

To speed up the main loop the add2 and add3 functions are inlined, redundant computations are removed and the loop is unrolled 4 times.

Drawing:
The board is expanded from 8 bits to 32 bits using a lookup table and poked to screen memory.

P#94115 2021-06-27 00:20 ( Edited 2021-09-16 10:46)

FAST

P#94116 2021-06-27 00:21

excellent binary operation use case ðŸ‘Œ

question: why using 0x4300 region and not a plain table?
that’s faster to access than \$(address+8).

P#94127 2021-06-27 06:09

I tested both table and memory, and using memory is faster.
I guess poke4(address,a,b,c,d) is faster than table[i],table[i+1],table[i+2],table[i+3]=a,b,c,d. Also when drawing I need to access individual bytes, with a table I would have to shift the 32 bit numbers and mask by 0xff to access the bytes.

P#94135 2021-06-27 12:23

got it - haven’t saw the individual poke when glancing over code!

P#94149 2021-06-27 18:07
1

Cart #rld_conway_blur-0 | 2021-09-16 | Code ▽ | No License
1

This is a version that runs at 210 fps, but has smoother animation. Use up or down to change speed.

P#97393 2021-09-16 10:49 ( Edited 2021-09-16 13:57)