Minor Optimization to Occlusion Culling#107839
Conversation
There was a problem hiding this comment.
Tested locally, it works as expected. Code looks good to me.
I spotted a very slight performance improvement (from 2480 FPS to 2498 FPS with a 64×64 viewport to test for CPU bottlenecks) with a Linux x86_64 release optimized export template binary. It's consistently faster even over several minutes and across multiple runs, just a very slight increase. In real world projects, this change is probably more about saving power than actually increasing FPS (although remember that on mobile devices, lower CPU power usage can let the GPU use more power, therefore increasing FPS in GPU-bound scenes).
PC specifications
- CPU: AMD Ryzen 9 9950X3D
- GPU: NVIDIA GeForce RTX 5090
- RAM: 64 GB (2×32 GB DDR5-6000 CL30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 42)
|
Converted to draft for now. There are some additional optimizations I want make but they will only make sense after #108347 has been merged. |
8d92f0f to
5db3704
Compare
5db3704 to
507d346
Compare
507d346 to
56e0463
Compare
|
I rebased on top of 4.5.beta-5. I made further adjustements to the logic and structure to further inline functions. I tested these changes individually and found improvement after each. Testing using occlusion_culling_mesh_lod, on 4.5.beta-5 I get ~700fps and with the PR I get ~900fps
|
|
Thanks! |
Minor performance improvement to occlusion culling as well as a partial fix for #106184 (I'm working on something more comprehensive).
The performance gain comes from swapping one version of
Projection::xformfor another that is inlined. The two functions aren't a 1-to-1 match, so a minor rework the logic was required. After further testing, I believe the issue of small objects self-occluding (as described by @JFonS in #52545) was fully resolved in #94210. I’ve therefore updated the logic to be less conservative, resulting in a small improvement in occlusion rate which can be seen in the video below. The test project is similar to the MRP in #106184. The camera is placed inside a box occluder, so ideally, everything should become occluded.output.mp4
Regarding the performance improvements, I tested using occlusion_culling_mesh_lod. On my machine, it went from roughly ~1.2 ms/frame (~830 FPS) to ~1.05 ms/frame (~950 FPS). Since this project is specifically designed to be bottlenecked by occlusion culling, I also tested using the tps-demo. However, I wasn’t able to observe any noticeable performance improvement (~122 FPS before and after).