Overhaul and optimize Glow in the mobile renderer#110077
Overhaul and optimize Glow in the mobile renderer#110077Repiteo merged 1 commit intogodotengine:masterfrom
Conversation
bc3cef7 to
e5adc89
Compare
e5adc89 to
044ab0a
Compare
No, I intentionally duplicated the code since I expect these shaders to diverge even more as time goes on. Trying to share code between them is what led us to leaving a huge amount of performance on the table. This is a case where having optimized code for each platform is worthwhile and saves us literally milliseconds per frame. |
Why aren't specialization constants an option for this case? I suspect that specialization constants will be added at some point in the near future anyway to address performance issues from branching, even if we are using two entirely separate shaders. (In the past, I've seen a measurable performance gain when using the Linear tonemapper from simply turning: into: (which is exactly how this would compile with specialization constants, if I understand correctly) |
There was a problem hiding this comment.
Tested locally, it works as expected.
I noticed an issue with 3D resolution scaling on macOS + Metal (or MoltenVK) though. The glow buffer doesn't appear to be scaled, so it will appear at the wrong position if the resolution scale isn't 1.0. Note that this issue does not occur if using MetalFX Spatial, only with Bilinear (FSR 1.0 isn't supported on Mobile).
Testing project: test_glow_mobile.zip
| Scale 0.25 | Scale 1.0 | Scale 2.0 |
|---|---|---|
![]() |
![]() |
![]() |
Scale 0.25 with MetalFX Spatial (looks correct):
Whoops, I hadn't looked into this PR thoroughly enough: I see now that specialization constants were added AND I presume there's a substantial reason that specialization constants weren't an option for optimizations that prompted splitting |
|
With the Mobile renderer, the first level seems to add a lot of jaggies. This is especially the case with HDR 2D enabled:
MRP: glow-test.zip |
servers/rendering/renderer_rd/shaders/effects/tonemap_mobile.glsl
Outdated
Show resolved
Hide resolved
|
I'm struggling to review the code changes of this PR because I'm unfamiliar with both the existing and new glow gathering code. This is a "me" problem and not a problem with this PR. ...But I mention it, because it leads me to some thoughts that I figure are worth mentioning just in case it's helpful: This PR seems to be a combination of:
This is pie-in-the-sky thinking, but an alternative approach may be to add specialization constants to the tonemap process in a separate PR. This PR could explore specialization constants for both Forward+ and Mobile on different mobile and desktop hardware. There is also a part of the Godot PR process that I struggle with in general: although a single commit is best for merging, it makes things harder to review for a person not very familiar with the code when a PR combines multiple changes, like this one. Anyway, not sure if my thoughts are helpful. I understand people only have so much time to work on these things and there is some efficiency to be gained in the development process by bulking multiple changes into one commit. I'm not sure if the extra complexity during the review process balances that or if there is risk introduced by bugs that couldn't be caught during review due to the extra complexity of the single commit during the review process. (This is likely a discussion that is beyond this PR?) |
|
Oh! I've been so focused on finding any outstanding issues that I failed to voice my positive thoughts about this PR:
These changes in this PR are extremely valuable and I'm really looking forward to having these merged in! It will be a game changer in performance on mobile! |
What do people think of getting rid of Level 1 for Mobile and Forward+ altogether? Even if it's not as expensive on desktop GPUs, I suspect it still has a cost that almost nobody benefits from? Just spit-balling ideas here. Other than this, I'm not sure how this issue can be addressed... |
|
Here's a quick comparison of this PR with Godot 4.5 for all levels, 1 through 7:
The tool is fully automated, so it's easy to generate more images. Takes no time at all to try other configurations, just ask if you want to see. This test does highlight that level 1 is the only regression in this PR... |
I don't intend to make any improvements to glow level 1. I searched Godot projects on Github and couldn't find any that relied on level 1. In practice it is kind of useless because the kernel is so narrow that it doesn't look like anything. Ironically level 1 is the most expensive to compute because it is reading from the highest res source, and writing out the highest number of pixels. What this PR does to avoid that cost is simply downsample instead of blur level 1. So what you are comparing is a raw downsampled image to one that has already had an expensive layer of blur applied. They are not intended to match at all |
This makes sense and matches my understanding. This is why I suggested we maybe remove level 1 entirely, since it adds jaggies and doesn't work as expected. To do this, I guess we could hide level 1 in the editor and force its intensity to always My thought is that if level 1 is left in this state, it will both appear as a bug and also introduce risk of games being made to look worse if upgraded from a previous version that uses level 1 in the blur effect. There is also a possibility that in some scenes it looks acceptable, but other scenes show the jaggies... So glow is made more difficult to use because jaggies might appear in scenes other than the one the glow was initially configured for. An example is when you have rectangular, axis-aligned objects in your setup scene that only rotate off axis in other scenes. (Said differently, I'm concerned that this level 1 behaviour produces a bad user experience.) |
I have no problem doing that! |
Let's go with that approach then! One other wrinkle is how the "normalize" function works: level 1 should not contribute to the calculation of normalized levels, in case it's been set through script or by a resource that was saved with an earlier version of Godot. I think that should handle all the edge cases to make this a good user experience... |
|
Thanks! |
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
|
I just finished bisecting and it looks like this PR broke foveated rendering with OpenXR and the Vulkan Mobile renderer. The project I'm testing doesn't have glow enabled, so it's probably some random collateral damage, although, from looking at the diff I'm not entirely sure the cause yet |
|
@dsnopek Could it be a hardcoded framebuffer format somewhere? |
|
No difference for me. Still a poor performance on mobile. |
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
Cleared this up in the rendering meeting: it was the change of threshold default that was supposed to help with some scenes' jaggies, but this change of default has been reverted so we're back where we started with jaggies generated from level 1. clayjohn and I both don't have strong feelings regarding the behaviour of level 0 after this PR, so we're leaving it as-is and see if people report it as a serious usability issue. |
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
|
|
||
| tonemap_mobile.shader_version = tonemap_mobile.shader.version_create(); | ||
|
|
||
| for (int i = 0; i < TONEMAP_MODE_MAX; i++) { |
There was a problem hiding this comment.
| for (int i = 0; i < TONEMAP_MODE_MAX; i++) { | |
| for (int i = 0; i < TONEMAP_MOBILE_MODE_MAX; i++) { |
Currently not critical, but should be adjusted at some point (enum TonemapMode and enum TonemapModeMobile currently have 8 values)
There was a problem hiding this comment.
Good catch. Indeed this works fine, but only by chance. If either enum changes it might break
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
Reverts the default value of Environment.glow_hdr_threshold from 0.0 back to 1.0 to restore the expected glow appearance in existing projects. The default was inadvertently changed from 1.0 to 0.0 in PR godotengine#110077, which caused glow effects to render dramatically different across all rendering methods (Forward+, Mobile, and GL Compatibility). This broke backward compatibility with existing projects like the Kenney 3D Platformer starter kit. Changed files: - scene/resources/environment.h - servers/rendering/storage/environment_storage.h - drivers/gles3/effects/glow.h - drivers/gles3/rasterizer_scene_gles3.cpp - doc/classes/Environment.xml Setting the value back to 1.0 aligns with documented recommendations and restores visual consistency. Fixes godotengine#112469
















































Depends on #109971
Fixes: #98531
TLDR
This PR overhauls our approach to glow to use something that is both higher quality and more suitable for mobile devices. On the tested devices the performance improvement is upwards of 2X on all tested devices and reaches over 7X on one device.
Notes
The way glow works in master is first, we downsample and blur the screen using a two pass guassian blur. Using 2 passes allows us to take
kernel_size * 2samples instead ofkernel_size * kernal_size. Then, in the tonemap shader, we gather all the glow levels (each mip level of the texture), multiply them by their level amount and add them together.There are three bad things about this approach:
Profiling showed that texture reads were the biggest problem for glow. Most of those texture reads come from the tonemap pass. When using all 7 levels we do 7 texture reads per pixel in the tonemap pass and 6.6 texture reads per pixel during downsampling. If we eliminate the first level and skip to level 2, then we can bring that down to 1.6 texture reads per pixel during downsampling. That's a nice improvement, but it still leaves 8.6 samples per pixel because of the gather. Accordingly, we need to somehow move the gather into a lower resolution pass so we reduce the number of samples taken at full res.
In the end, I prioritized reducing texture reads over reducing passes. I use the technique described in https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/siggraph2015_2D00_mmg_2D00_marius_2D00_notes.pdf with a few key optimizations.
Broadly, the technique downsamples in a single pass using an optimized blur that is somewhere between a box blur and a gaussian blur. It relies on the fact that repeated applications of a box blur add up to a guassian blur. What we do is downsample with the blur to our target level, then we upsample back up to full res, adding our original blurred textures in as we go. This takes
2 * levels -1number of render passes, but allows us to do the gather as we upsample. So all the gather texture reads are done at low res. This massively reduces the sample count per pixel. With 7 levels using this technique we do only one sample in the tonemap shader and we do 3.0 samples per pixel during the downsample and upsample passes combined for a total of 4 samples per pixel.I then also did a few other optimizations:
Quality
Forward+

Mobile: Before

Mobile: After

Performance
The performance of each depends in part on the screen resolution
Adreno 640 (1080 x 2280)
Mali G710 (1080 x 2400)
Weirdly the Mali G710 is a much more powerful device (Pixel 7 vs Pixel 4). But the Adreno 640 wins here because Adreno devices can basically turn off TBDR rendering and go fully immediate mode for workloads like this.
As you can see, performance on Mali devices scales very closely with the number of render passes used.
Adreno 530 (1080 x 1920)
Intel Xe graphics (1152 x 648)
Before (all levels, glow): 0.24 ms
Before (all levels, tonemap): 0.5 ms
Before (glow off, tonemap only): 0.19 ms
After (all levels, glow): 0.13 ms
After (all levels, tonemap: 0.14 ms
After (glow off, tonemap only): 0.12 ms
Forward+ (all levels, glow): 0.4 ms
Forward+ (all levels, tonemap): 0.45 ms
I'm surprised to see such a big win here!
M2 MBP (2400 x 1366 + 2.0x scale_3d)
Note, the timings for glow were pretty variable and could swing by about 50% between runs. I tried to take the worst runs from the after and the best runs from before to keep things fair. In any case, the new code is consistently an order of magnitude faster while the tonemapping code is more like 3x faster.
TODO
Notes
Using the sampling functions from https://www.shadertoy.com/view/mdsyDf
Using the technique from https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/siggraph2015_2D00_mmg_2D00_marius_2D00_notes.pdf
Next Steps
In another PR, we can copy this approach for both the compatibility renderer and the Forward+ renderer. The compatibility renderer approach is already based on https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/siggraph2015_2D00_mmg_2D00_marius_2D00_notes.pdf, but we can optimize it a lot more