ResourceLoader: Fix potential infinite recursion in progress reporting#113114
Conversation
46dcc5e to
17ac9b3
Compare
DarioSamo
left a comment
There was a problem hiding this comment.
This fixes the issue for me, although I think there's potentially a better way to fix it by addressing the root cause. However, this should address a critical issue present at the moment that makes the threaded background loader not viable to ship otherwise.
|
Elaborating, this fix is natural to the current approach to progress computation. The current algorithm doesn't isolate different load trees so a given dependency being loaded, present in multiple ones, can contribute to the progress report of both trees. This implies that cycles can happen, something we hadn't add a countermeasure against yet. The "root cause," as I see this, is precisely the whole current approach to progress reporting. There may be a better way to do it, which is more direct or freer or pathological cases. |
17ac9b3 to
84dc324
Compare
|
CICD complains |
84dc324 to
8480dc2
Compare
Ivorforce
left a comment
There was a problem hiding this comment.
I'm not sure I understand what's causing cycles here, but the addition of the bool is cheap, so it's a simple fix.
I assume you have made sure that the function cannot be called / isn't called on multiple threads at once.
core/io/resource_loader.h
Outdated
|
|
||
| bool awaited : 1; // If it's in the pool, this helps not awaiting from more than one dependent thread. | ||
| bool need_wait : 1; | ||
| bool in_progress_check : 1; |
There was a problem hiding this comment.
I'd add a comment like // Measure against recursion cycles in progress reporting. Cycles are not expected, but can happen due to how it's currently implemented.
There was a problem hiding this comment.
I'm adding an explanation, not exactly that one.
8480dc2 to
3596d75
Compare
3596d75 to
5806e3c
Compare
This function is only called with the lock held. |
It is thread-safe as far as data races go, but there's a random factor from said threads affecting the "stack" in the relevant code that means there's a good chance child resources will add a parent resource as a subtask, leading to an infinite loop. For example, a .mesh ends up adding a sub-task of a .tscn being considered a sub-task, and said .tscn has the subtask of loading the same mesh. This is what led to this being very random and hard to track when I investigated it and why it's not easy to make a consistent MRP for it. Pedro claims fixing this scenario would require a redesign, and I'm not really sure I follow along the logic of the code too well to understand how it works yet, but fixing it this way is certainly better than leaving it unattended. Since the function is used only for cosmetic purposes (computing the progress), it should be a better outcome than it causing a crash from stack overflow. |
|
Thanks! |
The issue has been found by @DarioSamo.
The problem is that the boundaries between multiple dependency trees are trespassed for progress reporting, which is not bad per se, but that can lead to cycles, which in turn lead to infinite recursion.