I'm going to start with my general understanding and label questions [Q#] with the hope someone can fill in the blanks. Consider the discussion of 'cost' in terms of speed/cpu. I'm hoping there's a general concept I'm missing that will cleanup the flurry of questions around [Q2], regardless please bear with me:
General code is written topdown like the below where you define the function at the top and then you can simply call said function below that point to access the block inside. I don't fully understand the cost, in terms of lua itself, associated with defining 'function a()end' (the pico8 wiki has general pico8 cycle costs). My understanding though is the program won't look inside of the function 'until' I call it and so there is simply some generic predefined 'base cost'(cpu cycles) to defining a function and this cost is not affected by the number of parameters or whatever is inside the function....tldr it's always a flat cost each time the compiler/better-word runs down that part of the page and sees you want to define a function. So this usually amounts to the general idea of 'define the function once at the top of the code' and then call it as many times as u need where u need to after:
function a()end for i=1,100 do a() end --vs: for i=1,100 do function a()end a()end; defining the same function 100 times is bad.
However, the defining of function, like most parts of lua, can be optimized through the limiting of its scope. Again my understanding of scope is limited(puns), but as a functional explanation it amounts to 'smaller scope of an item = less cpu cost of the item'. Turning functions local makes it apply to 'just the' remaining part of the block it is currently in, and you can further limit the scope of a larger block with 'do end'.
--code above-- do local function a()end for i=1,100 do a() end end --code below--
I think I got the general idea above. My main confusion lies when you toss the idea of a gameloop into this picture. I'm assuming there is no secret difference to how _update() and _draw() handle functions outside of the idea _draw() might skip frames, so Let's take the example below which seems to be how most games orient things. My understanding in pico8 being: everything runs top-to-bottom 1x at the start of a game(unless I'm doing function calls in the middle of the ocean, the stuff in the functions will not be run yet)....then everything inside _init() runs 1x...then everything inside _update() and then _draw() runs over and over xx times per second.
function _update()a()b()end function a() end function b() end
If we wanted to define the above functions as local, the next example below shows the change in order necessary to do so...and there seems to be no change in cost(ctrl+p testing values in pico8) between a local function a() or global function b()...since they seemingly have the same level of scope and are only defined 1x each (is my understanding as to why). [Q1] So that trope I hear from the love2d community and sometimes in pico8 to 'just redundantly define everything as local' is wrong....but doesn't inflict penalty.... so it's really just a habit builder to protect against mistakes where u otherwise would find benefit from limiting the scope...?
local function a(x) return x^4 end function _update()for i=1,10000 do a(10)b(10)end end function b(x) return x^4 end
Okay so we now have base cases for gameloop and nongameloop above. So Let's look below where the cases overlap. In these situations I'm left confused and often wondering if there is some kind of benefit to 'defining the function within a block within the gameloop'... or just to revert to the above where u toss the function outside everything with all the other global functions..aka where it only gets defined 1x at the start of the game and never again. Take this info() example:
function _draw()info()end function info() if time()==2 then print("<5")end if time()==3 then print("<5")end if time()>5 then print(">5")end end
[Q2]Let's say I wanted to keep everything in this truly horrible format, but create a function for the redundant print("<5") blocks. If the function I want to use is: function p()print("<5")end
Which (A/B/C/D/?) position should I use to define function p()... and why?
function _draw() --D) local function p()print("<5")end info() end --C) local function p()print("<5")end --if [Q1] was correct then this position and A are the same thing function info() --B) do local function p()print("<5")end if time()==2 then p()end if time()==3 then p()end --B) end if time()>5 then print(">5")end end --A) function p()print("<5")end
I would think B would be the correct position....but then I consider the fact that B is a function being defined every single frame in a loop....and the code where we call p() is only actually running for two frames in the game. I would think any potential saved overhead those two frames might be undone every other frame by this fact, and that A) would be the appropriate answer. But what if info() only ran for specific frames and every time it did run the p() was called without exception? I would think then B is indeed the correct answer since it mimicks the nongameloop ex at the top of the page.... But that still assumes the cost per defining each frame is still less than the amount saved by calling a local function....wud that change based on how many times I need to call that local function.
What if info() was in a for-loop that ran a thousand times in one of those combinations....what if it was just p() itself?
function _draw() --D) local function p()print("<5")end; [Q3] is D now the best position? for i=1,1000 do p() end info() end
[Q4]What if environments come into play...a local function defined in the current environment each frame of the gameloop vs a global function linked through a metatable from another environment that's defined just 1x outside the gameloop. Here's an earlier example I had: If the rest of the game is defined in the environment of _G = _ENV _G.__index = _G and if add_bullet(...) is called each time in _update when the player presses the fire button...and there are variable chances of the bullets fired actually being of the type (10%/50%/90%) that call the function trimg_xy() if at all... where should I place the function trimg_xy? in A. or B. or C. or other?
I guess all these questions comes back to: [Q5] how much does this all compare to the cost of defining a function 1 or more times every frame within the gameloop vs 1x outside of it? Cause that's effectively what this is...do we define a function 1x like A) outside of everything.....or define it xtimes every frame....and is there a different payout depending on which per each conceivable scenario.
Regarding Q1, I don't think I've come across that advice, and I'm not sure why it would apply to love2d. In pico-8, one of the joys is trying to optimize things ridiculously, but in a full modern game I would expect such things to not matter in the slightest. In any such case, you're still using Lua, which itself is only ever so efficient, but also in love2d I was under the impression that no artificial limits exist so the full processor is just running code as normal. I don't use love2d myself, because I have my own custom made engine that's a lot closer to pico-8 in terms of api, but I've never noticed any differences in local versus global. I haven't testing such things directly, but I would expect any benefits from using only globals (which is mainly have less stuff to search through to find the variable during lookup) to be mitigated by the fact that you're creating new variables constantly.
Regarding Q2, I would simply advise against it. Lua doesn't inline functions no matter how simple, and I'm pretty sure a function call is more expensive than branching. This is probably counter-intuitive if you've done c or the like before, since branching is looked at as really expensive, but since Lua is an interpreter language, all commands and operators just go through the big switch block under the hood anyway. However, function calls require creating a new stack frame that the interpreter has to keep track of. If you did commit to that form though, I wouldn't expect any way of doing it to matter much. You're still looking up a function and calling it.
Regarding Q5, I would definitely expect 1x to be better, because you're adding a chunk into memory, which could cause shuffling behind the scenes and more calls to the garbage collector. To that end, I'd like to point out something that isn't addressed in any of your paragraphs:
_update is not required to be defined that way. All the engine looks for is a global variable with that name. If found, the engine will attempt to call it for the game loop. While having it be one unchanging function seems to be default assumption, it's not what I prefer to do myself. I instead define a separate "update" function for each part of a game and then set _update equal to whichever function is currently relevant.
local variables are faster than globals
, also true for a function (using a local syntax) but the calling cost makes than gain negligible.
as you already noticed, calling a function is not cheap - usual tradeoff between tokens (sharing code) and speed (duplicate code).
Local functions are ‘compiled’ once by lua (good) but will incur a garbage collect cost. This can be significantr if the function closure holds unique objects.
note: this is true for any tight loop, creating/deleting large tables has a cost
and as usual, testing within your particular game engine is key.
For example, have you measured the cost of the ‘dub’ function? certainly good for saving tokens but possibly bad for perf.
Point is, it is always difficult to know where perf problems are unless every step is challenged/measured.
@freds72 Calling cost between a local and global function is negligible... to what degree though? I think of the number of calls in a single frame of a game and specialty algorithms that might loop alot of these calls (i.e. pathfinding) and am left thinking it becomes noteworthy at some point. I would consider anything registering using ctrl+P noteworthy.
The argument goes back to testing each situation independently? Okay, but I've done that and have been unable to find a clear answer. Ultimately I was having a very hard time testing and measuring those differences. Usually I'd then define a simplified code example and study the differences to guide me in how to optimize larger examples...but the differences really are quite immense to the point where I'm unable to get a good read or find some general rule this way. So I seek to expand the search to the structural workings of lua (however u want to phrase the 'under the hood' idea). Perhaps yes, the problem is the differences are indeed "so small" that it's making testing difficult/inconclusive.
And yes stuff like dub() is an intentional trade of cpu for tokens...which brings me to one of the other points which is that alot of these examples seek to optimize stringification functions. So in the case of dub()... what if I defined it as a local function within the bullet function within the local environment? An answer might 'again' become to test it...but again the tests aren't being conclusive enough. Regardless there should be a specific go-to answer like in all the other cases where we mash a few lua blocktypes together. The OP is asked in the hope of finding some 'under the hood' concepts to provide guidance on how to deal with these questions. I don't accept the idea that if the tests aren't close enough to see a difference it's irrelevant....lua follows structured rules, so there must be an answer. And yes I understand there are 'other points' to optimizing cpu for these types of concepts, but the OP is focused on differences between specific type.
Personally I find it frustrating to not have an obvious go-to answer to the [Q#] in the op. If we know lua...we should have an obvious answer, testing should simply be a confirmation of the answer and a test to see if pico8 actually respects it. https://www.lua.org/gems/sample.pdf usually guides my understanding of where things go in the finicky scope of 'where a is best placed in relation to b' kinda things, but I am unable to extract answers to the OP from it. I'm hopeful I missed something.
Lua perf tips doc provides some basis.
I suggest to not use ctrl+p only to measure perf - this measuring the whole program and improvements are going to be hidden.
- take the diff of stat(1) to measure cpu cost of a specific area of the code and compare 2 implementations
- use: https://www.lexaloffle.com/bbs/?pid=104795 to bench a bit of code (don’t try to measure too much at once)
- use the wiki to get a sense of the cpu cost of each pico8 functions
And remember that finding perf bottleneck can be counter intuitive, eg first find where most of the time is spent (using stat(1) diff and then dig into that area).
[Please log in to post a comment]