Use AncestralClass to speed up Object::cast_to when possible.#110763
Use AncestralClass to speed up Object::cast_to when possible.#110763Repiteo merged 1 commit intogodotengine:masterfrom
AncestralClass to speed up Object::cast_to when possible.#110763Conversation
d3257d6 to
ac579a9
Compare
AncestralClass to speed up Object::cast_to when possible.AncestralClass to speed up Object::cast_to when possible.
88a9323 to
c5239f6
Compare
c5239f6 to
96619d4
Compare
|
For completion's sake, I tested the performance impact of this change. It's about 2x on hot access, but would probably be better on cold access. See main post for details. |
|
Thanks! |
|
This pull is causing CollisionShape3D errors. |
|
This pull cause both 3D and 2D collision shape not detect by body. |
|
Can confirm that removing Just investigating why it is causing a problem (maybe a conflict for derived classes where it is trying to use ancestry). I suspect this line isn't working correctly for some inheritance chains, but I haven't figured it out yet: Interesting that this problem doesn't seem to occur in 3.x, possibly because we are doing the same thing longhand and checking ancestry directly where it makes sense, or perhaps the inheritance chains are different. Will leave this to @Ivorforce when he's up. If he's unavailable today I might be able to have a look at a fix, or we can just revert temporarily to prevent users having broken projects when using master, as the bug is quite annoying. |
|
Got it, it's actually my bad, rather than a problem with this PR, it's a problem with #107868 (which was merged just before this one, but this one exposed it).
|
@harrisyu Although I've confirmed the bug with 3D, I haven't worked out yet how this could be affecting 2D collision shapes, do you have an MRP that shows this? (or even let me know if one of the demos isn't working correctly) |
This comment was marked as resolved.
This comment was marked as resolved.
Yes, The choice of which to accelerate was based on empirical testing, but we may add more in future. |
Sorry to see that late, I see first pull changed to 2D and 3D, but I only test 3D. |
Object::cast_towith ancestral classes when possible #107844Objectancestry #107462This implementation is 2x faster (hot access, probably more on cold access) than regular
Object::cast_to, for classes with corresponding ancestral classes.It would be possible to "pre-check" the types that derive from any of the ancestral types (e.g.
object && object.has_ancestry(AncestralClass::SPATIAL) && object.is_class_ptr(SpatialSubclass::get_class_ptr_static())). But that would only accelerate the unhappy path, and only some of the time. The happy path wants to immediately useis_class_ptr, so I decided against this micro optimization.I also made
has_ancestrya private function. Callers should generally usederives_from/Object::cast_toinstead, which is more resistant to us changing the ancestral classes in the future.Benchmark
The benchmarks show a ~2x improvement for cold access. The improvement is likely to be better in reality because we generate more hot situations, and need less cold RAM accesses even in cold situations.
Note that the difference is small in absolute terms, due to
cast_toalready being quite well optimized.Object *nav_agent = memnew(NavigationAgent3D); Object *node_3D = memnew(Node3D); { auto t0 = std::chrono::high_resolution_clock::now(); for (int i = 0; i < 1000000000; i++) { // Test happy and optimized get_ptr_node3d(node_3D); } auto t1 = std::chrono::high_resolution_clock::now(); std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n"; } { auto t0 = std::chrono::high_resolution_clock::now(); for (int i = 0; i < 1000000000; i++) { // Test happy but unoptimized get_ptr_navagent(nav_agent); } auto t1 = std::chrono::high_resolution_clock::now(); std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n"; } { auto t0 = std::chrono::high_resolution_clock::now(); for (int i = 0; i < 1000000000; i++) { // Test unhappy but optimized get_ptr_node3d(nav_agent); } auto t1 = std::chrono::high_resolution_clock::now(); std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n"; } { auto t0 = std::chrono::high_resolution_clock::now(); for (int i = 0; i < 1000000000; i++) { // Test unhappy and unoptimized get_ptr_navagent(node_3D); } auto t1 = std::chrono::high_resolution_clock::now(); std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n"; }With the helper functions being declared in a separate module (to avoid inlining):