Fix potential DAP crash at startup#114196
Conversation
|
Being able to reproduce the bug and confirming this fixes things would be really important here, the check being added wont be an issue if it isn't needed but we should confirm it actually fixes something |
|
This is the stack trace, but regardless of it, the code is clearly broken: |
|
In any case, I already applied this change to our trees, I am just sharing in an attempt to improve the stability of Godot for all users. We are fortunate enough to have telemetry data that tracks crashes in the wild in difficult conditions which Godot is not getting, a small way of contributing - but I do not have the cycles to put together much more than this - I leave it up to the Godot maintainers to decide whether they want these changes. |
|
That crash log should be sufficient, thank you You're not obligated to put in any more work than you are willing, but I'm pointing out the bug fixing and triage process, to ensure work that goes into the engine is doing something useful, especially to ensure it actually fixes some issue someone has rather than the issue actually being elsewhere, it's important not to play "whack a bug" with the codebase but to be able to make informed decisions on what to merge (it's especially important to know if something actually fixes anything or if it just masks a bug elsewhere) |
6072c87 to
d3c173d
Compare
Crash seen in the wild on Xogot with telemetry.
d3c173d to
27100c7
Compare
|
Amended the commit message to follow our guidelines. |
There was a problem hiding this comment.
This seems consistent with how e.g. EditorFileSystem validates its connection, so that seems fine. There are more places in the codebase where take_connection() calls aren't always validated (notably in thirdparty/enet/enet_godot.cpp) but they're outside the scope of this PR.
AThousandShips
left a comment
There was a problem hiding this comment.
The stack trace confirms the Ref being null being the cause of the issue and it is performed elsewhere so indicates that the null state of the ref is not unexpected so would seem to be the correct fix, if later analysis indicates something else is wrong and should be fixed elsewhere this won't prevent that and we still have something indicating that state
Thanks!
|
I agree, if Lines 50 to 65 in 551ce8d Still, I agree with @AThousandShips that it would be better to get to the root of this problem if possible. I'm curious if you have a scenario with multiple connections to the DAP server. This code was copied back then from LSP at the time, but I can't think of any scenario where multiple connections is useful in DAP, and I'm pretty sure the implementation is not tackling it correctly either, so I'm curious if this the case here. If you want to, and since you can't reproduce this bug locally, we can instead have a chat on Rocket.chat if you feel this should be analyzed further. |
|
I'd suggest asking the person who reported this to @migueldeicaza to open an issue here on GitHub to look into this if they can, but getting the details would help narrowing it down, but it shouldn't crash in this case but agreed it might be an issue elsewhere that could benefit other areas if fixed |
|
Many of our crashes are only experienced by users in the wild, and we get anonymized crashes from Apple/Google which only contain the crash information, and then we stare at the stack traces and try to prevent another user from hitting the same issue. If I had to estimate, we probably get some 100 crashes like this for every case where a user reaches out to us to complain. For a mix of reasons, people do not bother, people do not think it will get fixed, people do not know what happened, people fault something else, people do not know how to reach out, people are not familiar with processes. 50 is probably generous, it could be 100. On our end, we monitor those crashes constantly and while some users might have had a bad experience, it helps future users not run into the same issue. I personally think that Godot should distribute an instrumented version of the engine to improve the stability of the engine. |
If by instrumented you mean with debug symbols you have godotengine/godot-proposals#1342 If instead meaning gathering data on crashes etc. that's very complicated and messy with data collection laws etc. |
|
I mean the latter, when you detect a crash, you upload the data to a site that aggregates it. You could make it an opt-in feature, you ask at startup "Do you want to assist developers in improving Godot?" and then if they say yes, you activate the crash reporter, otherwise you do not. We are using Crashlytics from Google, and we also get data from Apple. But there is a fabulous new stack that I am tempted to use from Bitdrift - which also solves other problems. |
|
I'd suggest opening a proposal, but there's a lot of practical aspects, including how much larger the target audience for Godot is compared with individual projects using Godot, making our handling of the data problematic from a practical perspective (beyond just the legal aspects, and the reasonable aversion of a lot of people to different software collecting data), related concepts and proposals have been discussed in the past and as I recall there was a relatively large push back to the collection of data (then we have the potential issues of forks of the engine pushing telemetry to our data servers causing issues etc.), and we might not get a lot of data for the same reasons of aversion (I wouldn't enable the feature myself for example) |
Well we already have a system like this in place ;) |
|
Thanks! |
|
@syntaxerror247 godot should publish that data, it would help prioritize bugs that are affecting the engine stability. |
Yeah, having more eyes on those crashes may help, the android team once thought about creating github issues for some frequent crashes but never got to this, but IMO that doesn't really make sense since reports won't have reproduction steps (which are very important) and most crashes are only on low-powered devices. There's already some issues opened with crash details from play console but those are still unresolved as we don't really know the root cause. That said, the team is keeping an eye on them and is creating PRs where possible. |
|
I find myself on the same spot, initially not knowing what triggers the crash or the steps, but sorting this out is not too hard. |
This also appeared on our crash telemetry, I have not been able to reproduce, but I suspect it is a timing issue when the client has gone away before the server connects.