X Tutup
Skip to content

Add HashSet.duplicate()#107377

Open
aaronfranke wants to merge 1 commit intogodotengine:masterfrom
aaronfranke:hashset-duplicate
Open

Add HashSet.duplicate()#107377
aaronfranke wants to merge 1 commit intogodotengine:masterfrom
aaronfranke:hashset-duplicate

Conversation

@aaronfranke
Copy link
Member

Follow-up to #106537 (comment)

HashSet has a copy constructor, but it's not obvious whether the constructor is truly copying the underlying data just from reading the usage. This makes the intent clearer, at the cost of adding a very small function to HashSet. There is already a similar API in Vector<> for duplicating those.

@aaronfranke aaronfranke added this to the 4.x milestone Jun 10, 2025
@aaronfranke aaronfranke requested review from a team as code owners June 10, 2025 19:32
@Ivorforce Ivorforce removed the request for review from a team June 10, 2025 19:53
Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some thoughts about this.
First off, I think this is generally a great idea.

What we have currently is the implicit copy constructor, which is very disingenuous about its real cost:

// Whoops, actually extremely expensive, because all strings are copied!
HashSet<String> set = p_hash_set;

It would already be much better to make copy constructors explicit only:

// Now it's somewhat clear we're making a new hash set.
HashSet<String> set = HashSet(p_hash_set);

However, I do prefer the .duplicate you suggest, even if it is a little non-C++ style.

Still, our priority for things like this needs to be consistency. If we're adding .duplicate, we need to remove the copy constructor. Otherwise, both APIs will be used, and it won't be clear which function is preferred.

Likewise, we can't just add .duplicate to one or two isolated container types. Otherwise, people will always have to guess if it's "one of those containers with .duplicate" or "one of those other containers with a copy constructor". We'd need to add them to all containers, and remove the copy constructors, in one fell swoop.

All of this makes this a bit bigger than a few innocuous lines, and it would need to be agreed on by the core team, in my opinion.

There is already a similar API in Vector<> for duplicating those.

That's a good point. I think the reason it exists for Vector is that it's exposed to GDScript (and it's probably in the codebase for consistency). It acts the same as the copy constructor from C++.
However, in GDScript it acts different from the copy constructor, because regular assignment references the original array, rather than making a CoW copy (like .duplicate). This creates a need for .duplicate, which likely prompted this function to be added in the first place.

@aaronfranke
Copy link
Member Author

@Ivorforce What is the harm in adding the API before removing the copy constructor? It would be nice to have both existing in the codebase so that third-party modules can support multiple Godot versions. I'm not saying we should delay removing the copy constructor for that use case, I'm just saying I don't think that should hold everything up. Plus, this way, it is easier to cherry-pick the addition of the functions to older branches if needed.

As for confusion on which to use, if the plan is to remove one, then... it doesn't matter, we'll migrate them all one way or another, since that migration will be forced to happen when removing the copy constructors.

We could definitely add the function to other container types, yeah. Which ones did you have in mind?

@aaronfranke
Copy link
Member Author

This PR's deprecation of the implicit duplication has caught a bug in the Wayland code: #101774 (comment)

I suppose that's another argument in favor of merging this in sooner rather than later :)

@Ivorforce
Copy link
Member

Ivorforce commented Sep 24, 2025

@aaronfranke I think for now, we should simply make the existing constructor explicit.

While .duplicate is a design pattern that has some merits, it's a bit of a paradigm shift and would need discussion and effort to make our APIs consistent in that regard. If you want to pursue this, I'd recommend starting by writing a proposal, so we can start discussing.

On the other hand, making the existing constructor explicit has similar benefits, but it would be a simple improvement we could merge right now.

@aaronfranke
Copy link
Member Author

aaronfranke commented Sep 25, 2025

@Ivorforce Attempting to do that has revealed a new problem. In functions like GDScript::get_all_dependencies(), GLTFDocument::get_supported_gltf_extensions_hashset(), and a dozen others, it is constructing the object being returned, so ideally we expect it to just be trivially moved to its destination. However, the presence of our explicit copy constructor means that instead a non-trivial copy is being performed in cases like these. If we were to remove the copy constructor entirely, that would allow us to use the implicit "move constructor", which I just now learned existed. But if we define a copy constructor, the implicit move constructor is deleted, and we would have to re-add it. A simpler option is to just delete the copy constructor and instead add a duplicate() function.

Another thing: Intellisense can find all uses of a function, but doing the same with constructors is much harder.

@Ivorforce
Copy link
Member

Ivorforce commented Sep 25, 2025

@Ivorforce Attempting to do that has revealed a new problem. In functions like GDScript::get_all_dependencies(), GLTFDocument::get_supported_gltf_extensions_hashset(), and a dozen others, it is constructing the object being returned, so ideally we expect it to just be trivially moved to its destination.

Yea... this is a problem we'll have to deal with eventually anyway. HashSet needs a move constructor, and whenever something is returned, it should be moved rather than copied.
I'm happy to do this, just let me know.

However, the presence of our explicit copy constructor means that instead a non-trivial copy is being performed in cases like these. If we were to remove the copy constructor entirely, that would allow us to use the implicit "move constructor", which I just now learned existed. But if we define a copy constructor, the implicit move constructor is deleted, and we would have to re-add it. A simpler option is to just delete the copy constructor and instead add a duplicate() function.

I don't think C++ can generate a correct trivial copy constructor for HashSet. If it were to generate one (which I don't think it would, as long as we have either any constructor or any value set to defaults), it would copy the pointers. However, HashSet owns its arrays, so it would need a copy constructor that copies the underlying arrays.

Another thing: Intellisense can find all uses of a function, but doing the same with constructors is much harder.

That may be true, and it's a good argument for the .duplicate pattern. But again, our number one priority is consistency. Currently, we use constructors quite extensively, and it would be surprising if some constructors were constructors and some were factory methods. It would probably lead to users using the wrong one, leading to inefficiencies. If we want to switch, we'll need to make a coordinated switch and change our paradigm throughout the codebase (or at the very least, consistently for this class and similar classes). Changing just one instance will lead to problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

X Tutup