Use LocalVector on GDScriptInstance::members#105555
Use LocalVector on GDScriptInstance::members#105555DeeJayLSP wants to merge 1 commit intogodotengine:masterfrom
LocalVector on GDScriptInstance::members#105555Conversation
2af5907 to
265ec22
Compare
|
It seems upstream made this slower (probably #100944). However, changing the members to a |
|
Some regressions are to be expected from resize changes by pure chance. If this is significantly impacted by it it definitely needs a better solution in general. Not sure how TighLocalVector could even help unless it's being resized every single frame. It has no effect if there's no resize. Do you have a repro for us to toy with? |
I use this stress test for every PR that affects GDScript performance: https://github.com/GaijinEntertainment/godot-das/tree/master/examples/bunnymark After further testing, I noticed there is a performance improvement only when there are too much instances: 6000 bunnies:
20000 bunnies:
Didn't bother measuring with regular LocalVector as all results were lower performance. It seems a bit situational. I'll report after testing on production builds which matter the most. |
The proportions are the same, with or without I admit that back then I was pairing the changes of this PR with #90082, which already caused a massive boost to GDScript. I decided to test everything again with both changes from #90082 and this PR over the current master. At 5000 bunnies:
At 10000 bunnies:
At 20000 bunnies:
At 50000 bunnies:
The results are too unstable, but I think maybe we should wait for #90082. A regular LocalVector seems more stable than a Tight one. |
265ec22 to
ebc059f
Compare
|
While experimenting how fast it would be with a I have modified so it makes and assigns a It seems whatever made the FPS drop is now gone (not sure if it was this missing change as it was fine with |
0ecb418 to
6a11011
Compare
Calinou
left a comment
There was a problem hiding this comment.
Tested locally, it works as expected. I'm getting mixed performance results though, so it's hard to say if this is really a win.
Benchmark
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 42)
Using a release export template binary (production=yes lto=full) on https://github.com/GaijinEntertainment/godot-das/tree/master/examples/bunnymark's GDScript version.
| Count | master |
This PR |
|---|---|---|
| 6,000 bunnies | 566 FPS | 550 FPS |
| 20,000 bunnies | 159 FPS | 159 FPS |
| 50,000 bunnies | 59 FPS | 60 FPS |
That's why I say we should wait for #90082 or something similar. Every time some PR touches GDScript or LocalVector it changes the outcome of this one. But in all cases combined with #90082 the performance was better. Out of curiosity I overlapped this PR over some changes of #90082 in a production build (6000 bunnies as that was the one affected negatively):
The same improvement ratio from above can be seen with 10000 or 50000 bunnies |
6a11011 to
e2bee63
Compare
|
Rebased so I could check the benefits from #106020 While the profiler shows that the overhead of a At 10000 bunnies, I can measure a 9% performance improvement with a production template build, but at 1000 bunnies it varies between a 0.3% improvement and a 2.3% decrease. Performance decrease disaappears after 2000 bunnies. I hope the issue with Variant isn't some EDIT: turns out most of I have tried to keep a regular Using |
|
Looks like the problem was never inside Does anyone know a possible solution for this? Or is it too reliant on CoW? |
e2bee63 to
1e88bf9
Compare
|
It seems like whatever overhead in I don't know if doing this would create unwanted behavior or possibly cause crashes, but so far it recovers some of the performance loss (the increase in I think the lack of CoW is causing |
1e88bf9 to
71f6bc5
Compare
71f6bc5 to
a64422f
Compare
|
Rebased after a very long time. By the time I opened this PR I thought I knew how to read Here are a few changes I did:
On very recent tests, when it comes to Bunnymark, there should be no performance difference at 500 bunnies, maybe a 0.06% gain (from many runs with the least amount programs running on background + Feral GameMode, as GDScript is extremely sensible to fluctuations when not performing too heavy tasks). Lastly, when running before and after at the same time, it's possible to see sometimes that memory usage isn't stable. Sometimes before consumes many more MBs, sometimes after does. But when they match, after is always one MB lower than before. Also the build is 4KiB smaller. From a recent discussion in the Contributors Chat, the conclusion is that Will update the PR description later. |
|
PR description updated with newer benchmarks and more considerations that are not the wrong profiler reading. |
a64422f to
f77f6ab
Compare
|
Switched back to regular |



Modifies
GDScriptInstance::membersto use aLocalVectorinstead ofVector. This should be slightly faster and go easier on memory due to the lack of CoW.Benchmarks
Project used for benchmarking: bunnymark_custom.zip (modify the line in
_ready()that adds bunnies as intended).It should start and immediately spawn the desired amount of bunnies. After 30 seconds, prints the FPS amount of the last 15 seconds (the first 15 are discarded as an anti-fluctuation measure) and exits.
The command was something like
for i in {1..3}; do gamemoderun ./before.x86_64 --no-header && gamemoderun ./after.x86_64 --no-header; done > bunny_count.txton a machine with almost zero programs running in the background. More than one run is necessary due to GDScript being too unstable and sensible to fluctuations.Unlike some previous benchmarks, I opted to keep the numbers as low as possible to keep closer to more common cases. There is virtually no difference if the amount is lower than this.
500 bunnies
masterAverage increase of 0.31%
1000 bunnies
masterAverage increase of 0.45%
5000 bunnies
masterAverage increase of 1.29%
Other considerations
On a very high amount of instances (probably near 50k bunnies), the improvement would certainly reach around 10%, it's just that it's nowhere near a common use case.
While the improvement numbers aren't that high, template release production builds on Linux had a decrease of 8 KiB. When running both before and after cases and looking at a system monitor, most of the times the after one will use a few KBs less memory.