6

Here is a simple implementation of merge sort in 97 tokens. It uses less PICO-8 CPU than anything else I’ve tried (with a caveat, see below).

A small benchmark comparing CPU usage, sorting an array of 200 random numbers:

Note that non-randomised quicksort suffers from very bad performance on quasi sorted lists, so the above benchmark does not represent what you would get in real life. Always test in your own application.

 ```function sort(t, a, b) local a, b = a or 1, b or #t if (a >= b) return local m = (a + b) \ 2 local j, k = a, m + 1 sort(t, a, m) sort(t, k, b) local v = { unpack(t) } for i = a, b do if (k > b or j <= m and v[j] <= v[k]) t[i] = v[j] j += 1 else t[i] = v[k] k += 1 end end ```

In real life merge sort performs worse than e.g. quicksort because it suffers from poor locality of reference (cache misses), but in a high level language such as Lua this becomes meaningless.

To any computer scientist this specific implementation would be badly written and would appear to perform extremely poorly because {unpack(t)} effectively copies the whole array n×log(n) n times. However, in PICO-8 world this function is ridiculously fast because for the virtual CPU the operation is almost free. Use at your own risk because the actual CPU running PICO-8 will still perform the operations!

Another limitation: does not support arrays of more than 16384 elements because of the overflow in (a+b)\2. Just replace with (a/2+b/2)\1 if necessary.

Edit: here is a version that performs log(n) full array copies and is thus much lighter on the CPU. It is also about 10% faster. The drawback is that it’s now 111 tokens, but that’s still pretty good.

 ```function sort(t) local function f(a, b) if a < b then local d = (b - a) \ 2 + 1 f(a, a + d - 1) f(a + d, b) local v, j, k = { unpack(t, a, b) }, 1, d + 1 local vj, vk = v[j], v[k] for i = a, b do if (not vk or j <= d and vj <= vk) t[i] = vj vj = v[j] j += 1 else t[i] = vk vk = v[k] k += 1 end end end f(1, #t) end ```
P#78990 2020-07-06 16:04 ( Edited 2020-07-07 10:18)

:: Felice

What is the actual cost on unpack()? I haven't timed it. If it's too cheap then we'll end up with "valid" carts running too slowly on lesser hardware again.

Intuitively, I would not expect "{ unpack(t) }" to be fast at all.

P#78994 2020-07-06 17:24 ( Edited 2020-07-06 17:37)

That’s a valid concern. I posted new code that does not perform those excessive numbers of array copies, at the cost of 20 tokens. I also added explanations about quicksort performance, because I focused on the worst case behaviour and from my rough benchmark the reader may get a bad impression of its performance in the average case.

P#79002 2020-07-06 22:17 ( Edited 2020-07-06 22:17)
:: zep

cpu police -- good and bad news.

This bad news is that unpack() still needs to be costed (my bad) -- it can't stay like that because the real cpu cost should never be a required consideration. For example, at 2 cycles per item, the second version would cost ~0.40 for the same test.

The good news is that with the same cpu cost, the second version would remain quite fast: around ~0.13 instead of 0.11. Does that sound reasonable?

It is something of an oddity that {unpack(t)} is becoming the standard fastest way to do a shallow copy of a table. Perhaps that needs to be reviewed also.

P#79007 2020-07-07 08:00 ( Edited 2020-07-07 08:02)

2 cycles per item?
Would that kill unpack?
in it’s current form, it is really handy to replace boring:
local a,b,c,d=m[1],m[2],...
to:
local a,b,c,d=unpack(m)

P#79010 2020-07-07 12:44 ( Edited 2020-07-07 16:05)
:: Felice

@freds72

P#79016 2020-07-07 17:29 ( Edited 2020-07-07 17:29)
:: Felice

@zep

By the way, in the absence of a built-in function to split or copy a table, {unpack(t)} should be the ideal way to do a shallow copy of a array, especially if you want to copy only a portion of it, e.g. {unpack(t,a,b)}.

You should assume people will frequently use it for that when you set costs.

P#79049 2020-07-08 09:59 ( Edited 2020-07-08 10:00)
:: Felice

Side note: It's kinda funny that we're talking about balancing a game engine instead of a game. XD

P#79050 2020-07-08 10:01
:: rilden

@samhocevar did you try shellsort?
This is an implementation using 70 tokens:

 ```gaps={701,301,132,57,23,10,4,1} function shellsort(a) for gap in all(gaps) do for i=gap+1,#a do local x=a[i] local j=i-gap while j>=1 and a[j]>x do a[j+gap]=a[j] j-=gap end a[j+gap]=x end end end```

Sorting an array of 200 random numbers uses 5-9% cpu, 6% cpu on average.

Edit: I made a mistake in the middle loop, looks like it's using 9% cpu on average on an array of 200
I also tried sorting the array {1,2,3,4} with your second version of mergesort and it's giving back a result of {1,1,3,3}

P#79173 2020-07-11 21:15 ( Edited 2020-07-11 22:28)