With the release of pico 0.2.3 the code slowed down a bit and exceeded 100% cpu, so I lowered the maximum fps from 420 to 390.

Controls: change color with left and right, change speed with up and down.

You can change the initial board by changing the spritesheet. Use colors 0 and 7.

The board is stored as a bitmap, 1 bit per cell at address 0x4300.

Updating the board:

32 cells are processed in parallel using bitwise operations.

The bits are added together using the following functions:

function add2(a, b) sum = a ^^ b carry = a & b return sum, carry end function add3(a, b, c) tmp = a ^^ b sum = tmp ^^ c carry = a & b | tmp & c return sum, carry end |

Let's use the following map to refer to the neighboring cells:

abc -- above d.e -- current fgh -- below |

We get the sum of b+g using add2(above, below).

We get the sum of a+d+f and the sum of c+e+h by using add3(above, current, below). To access the sum of a+d+f or c+e+h we shift the result left or right by 1.

Now we have the 3 sums as 2 digit binary numbers:

sum bit0 bit1 b+g: s0 c0 a+d+f: s1 c1 c+e+h: s2 c2 |

We add these numbers one column at a time to get bit0 and bit1 of the final sum:

bit0, c = add3(s0, s1, s2) sum, carry = add3(c0, c1, c2) bit1 = sum ^^ c |

A cell is alive in the next generation if:

- it has 3 neighbors, or
- it's alive now and has 2 neighbors.

In the 1st case: 3=11 in binary so the sum is 3 if bit0=1 and bit1=1 and carry=0. The formula is bit0 & bit1 & ~carry

Similarly in the 2nd case the formula is: current_cell & ~bit0 & bit1 & ~carry

The result is: bit0 & bit1 & ~carry | current_cell & ~bit0 & bit1 & ~carry

This can be simplified to (bit0 | current_cell & ~bit0) & bit1 & ~carry, which is equal to (bit0 | current_cell) & bit1 & ~carry

To speed up the main loop the add2 and add3 functions are inlined, redundant computations are removed and the loop is unrolled 4 times.

Drawing:

The board is expanded from 8 bits to 32 bits using a lookup table and poked to screen memory.

I tested both table and memory, and using memory is faster.

I guess poke4(address,a,b,c,d) is faster than table[i],table[i+1],table[i+2],table[i+3]=a,b,c,d. Also when drawing I need to access individual bytes, with a table I would have to shift the 32 bit numbers and mask by 0xff to access the bytes.

[Please log in to post a comment]