X Tutup
Skip to content

Add depth resolve to the mobile renderer#108636

Merged
Repiteo merged 1 commit intogodotengine:masterfrom
clayjohn:MSAA-depth-mobile
Sep 17, 2025
Merged

Add depth resolve to the mobile renderer#108636
Repiteo merged 1 commit intogodotengine:masterfrom
clayjohn:MSAA-depth-mobile

Conversation

@clayjohn
Copy link
Member

Fixes: #80991
Fixes: #103425

This is needed for #78598 to move ahead.

Resolving depth textures on mobile devices is a huge pain. In general, its best to never resolve the depth texture as mobile devices do a lot of special optimizations with it so they don't want you to touch it at all.

In a perfect world, we could use a resolve attachment and have the driver resolve the depth texture for us before the memory is copied out of tile memory (which is what currently happens for color). However, doing so is only supported with an extension (which #78598 implements).

Earlier I suggested adding a subpass to the main opaque pass that would read from the MSAA depth texture and manually resolve it into an R32 buffer. This would have the benefit of keeping the MSAA texture in tile memory only and saving a lot of bandwidth. Unfortunately, doing isn't supported widely either as it is optional in Vulkan for depth textures to support usage as input attachments. Further, as it turns out, it wouldn't even help on Mali, which would fall back to doing two passes anyway https://developer.arm.com/documentation/101897/0304/Fragment-shading/Multipass-rendering.

Therefore, the only option that is supported everywhere is to do a manual resolve of the depth texture in its own pass. Since we don't use subpasses anyway when reading from the depth texture, this didn't require much of a change in architecture.

Right now the code is structured to do the following:

  1. Render to depth
  2. Resolve depth (either in a pass or when copying back from tile memory as in Resolve depth buffer in mobile renderer when required #78598)
  3. Copy resolved depth to back buffer

With this PR I combine steps 2 and 3 and copy directly into the backbuffer. Doing so has the effect of offsetting the vast majority of the cost of doing the resolve manually.

Once this is merged, we can add #78598 as an optional path that is more optimized for devices that support the depth resolve extension.

@clayjohn clayjohn added this to the 4.6 milestone Jul 15, 2025
@clayjohn clayjohn requested a review from a team as a code owner July 15, 2025 12:51
@clayjohn clayjohn requested a review from BastiaanOlij July 15, 2025 12:51
@clayjohn clayjohn force-pushed the MSAA-depth-mobile branch from d861bbf to 38358d1 Compare July 16, 2025 09:06
Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Not entirely sure why CI is complaining about the resolve_effects variable hiding a class member as it looks like you're removing the class member, but might be something defined somewhere else still.

Will be nice to add in support for hardware resolve but I think this covers the most pressing use case.

@clayjohn
Copy link
Member Author

clayjohn commented Jul 17, 2025

Will be nice to add in support for hardware resolve but I think this covers the most pressing use case.

I suspect hardware resolve is faster on supported hardware. But even it may be emulated on TBDR devices. See the ARM documentation linked above. Mali devices can't merge subpasses that change the depth/stencil buffer which means that they will have to resolve the depth buffer in a separate pass anyway.

@BastiaanOlij
Copy link
Contributor

I suspect hardware resolve is faster on supported hardware. But even it may be emulated on TBDR devices. See the ARM documentation linked above. Mali devices can't merge subpasses that change the depth/stencil buffer which means that they will have to resolve the depth buffer in a separate pass anyway.

Indeed, and right now with the primary use case for which I did the original work now no longer being an issue (as everyone in XR land who actually uses this is switching to the ASW/frame synthesis approach for which a lower res depth buffer is rendered), and the remaining use cases all target devices where its questionable the API is supported and would even be faster, I don't think this is top of the priority list.

In due time, if we have a spare minute, it would be interesting to bring up my PR and do comparisons, but I don't think there is a pressing need right now.

@clayjohn clayjohn force-pushed the MSAA-depth-mobile branch from 38358d1 to a8efa55 Compare July 17, 2025 03:39
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on Metal and MoltenVK, it works as expected. Code looks good to me.

Mac specifications
  • MacBook Pro 16 2024
  • SoC: M4 Max 16-core CPU, 40-core GPU
  • RAM: 48 GB
  • SSD: 1 TB
  • OS: macOS 15.5
master no MSAA master 2x MSAA PR 2x MSAA
Screenshot 2025-07-17 at 15 25 04 Screenshot 2025-07-17 at 15 24 50 Screenshot 2025-07-17 at 15 28 31

@dsnopek
Copy link
Contributor

dsnopek commented Jul 17, 2025

Indeed, and right now with the primary use case for which I did the original work now no longer being an issue (as everyone in XR land who actually uses this is switching to the ASW/frame synthesis approach for which a lower res depth buffer is rendered)

I think there's still potential use cases for it. For example, allowing XR_FB_composition_layer_depth_test to be used with projection layers (which would remove the need for "hole punching" and all of its caveats), or a potential future world where multiple applications can provide projection layers to be composited together

Copy link
Contributor

@Ansraer Ansraer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nitpicks, otherwise this looks great.

void _render_buffers_copy_screen_texture(const RenderDataRD *p_render_data);
void _render_buffers_ensure_depth_texture(const RenderDataRD *p_render_data);
void _render_buffers_copy_depth_texture(const RenderDataRD *p_render_data);
void _render_buffers_copy_depth_texture(const RenderDataRD *p_render_data, bool p_use_msaa = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void _render_buffers_copy_depth_texture(const RenderDataRD *p_render_data, bool p_use_msaa = false);
void _render_buffers_copy_depth_texture(const RenderDataRD *p_render_data, bool p_used_msaa = false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now this shader can only resolve the depth texture. Are there any plans to support manually resolving other stuff in the future?

Because otherwise I would argue that "depth" should be mentioned in the name of this shader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to extend it to resolve other formats in the future. I'm indifferent on the naming though. If you think it is a big deal I am fine with changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would appreciate it if you could rename it to explicilt mention depth.

@Repiteo Repiteo merged commit 8c7c96e into godotengine:master Sep 17, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Sep 17, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hint_depth_texture depth is always 0 when msaa enabled Vulkan: Sampling Depth Texture on Forward Mobile with MSAA is corrupted

6 participants

X Tutup