Log In  

There seems to have been a change in the virtual cpu cost of +=

Previously, in 0.2.4b, both x=x+y and x+=y cost 1 cycles.
Now, in 0.2.5g, x=x+y costs 1 cycle while x+=y costs 2 cycles.
(Where x and y are locals)

The same happens with other operators that cost 1 cycle, e.g. -=/- and &=/&

This feels like a bug since I wouldn't except x+=y to be costlier than x=x+y

Below code shows the perf. difference.

function testme_calib(name, func, calibrate_func, ...)
  -- based on https://www.lexaloffle.com/bbs/?pid=60198#p
  local n = 1024

  -- calibrate
  flip()
  local unused -- i am not sure why this helps give better results, but it does, so.

  local x,t=stat(1),stat(2)
  for i=1,n do
    calibrate_func(...)
  end
  local y,u=stat(1),stat(2)

  -- measure
  for i=1,n do
    func(...)
  end
  local z,v=stat(1),stat(2)

  -- report
  local function c(t0,t1,t2) return(t0+t2-2*t1)*128/n*256/60*256*2 end -- *2 for 0.2.x

  local s=name.." :"
  local lc=c(x-t,y-u,z-v)
  if (lc != 0) s..=" lua="..lc
  local sc=c(t,u,v)
  if (sc != 0) s..=" sys="..sc

  printh(s)
  print(s)
end

function testme(name, func, ...)
  return testme_calib(name, func, function() end, ...)
end

testme("+", function(x,y) x=x+y end, 1, 2)
testme("+=", function(x,y) x+=y end, 1, 2)

P#126995 2023-03-11 20:30 ( Edited 2023-03-11 20:31)

Huh! I remember feeling like RP-8 took an unexpected CPU hit at some point, I wonder if this was the cause...

Oh, wow, this makes a huge difference now - like 4% CPU on the synth filter inner loop alone. Would be very nice to get this one fixed.

P#127102 2023-03-13 18:13 ( Edited 2023-03-13 18:17)

+1

P#127104 2023-03-13 20:15

If anything, it should cost less, because conceptually you don't need to parse a second token to know which two vars are involved, and you only need to obtain one reference for both the input and output vars, i.e. you implicitly know both are x.

I dunno if the interpreter or the bytecode is aware of this, though. It should be, but Lua's code is written to have a very small memory footprint so that it will fit in an embedded system's instruction cache, so it doesn't tend to have a lot of exceptional-case handling.

P#127128 2023-03-14 14:33 ( Edited 2023-03-14 14:38)
1

In addition, it looks like x*=y costs 3 cycles now, whereas x=x*y costs just 2, so this affects all operators, not just ones that cost 1 cycle.

P#127313 2023-03-19 04:01 ( Edited 2023-03-19 04:01)

this seems to only affect local vars -- I tested with my load #prof tool and confirmed it with this cart:

function _draw()
 local x,y=2,3
-- x,y=2,3
 for i=1,20000 do
--  x+=y
  x=x+y
 end
end
P#127438 2023-03-22 03:07

[Please log in to post a comment]

Follow Lexaloffle:          
Generated 2024-04-19 10:50:54 | 0.007s | Q:16