X Tutup
Skip to content

Expose missing String encoding conversion functions#104781

Merged
Repiteo merged 1 commit intogodotengine:masterfrom
Ivorforce:string-encode-complete
Sep 20, 2025
Merged

Expose missing String encoding conversion functions#104781
Repiteo merged 1 commit intogodotengine:masterfrom
Ivorforce:string-encode-complete

Conversation

@Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Mar 29, 2025

This can be seen as sort of a follow-up of recent PRs going in the same direction: I'm completing the function ensemble for string conversions with named encodings. In particular:

  • append_utf32 gets an Error return type, allowing the caller to react to poorly encoded strings.
  • append_utf32_unchecked is added (formerly copy_from_unchecked), for high performance conversion from trusted strings.
  • append_wstring is added, for conversion from platform defined strings.

I want to soon begin eliminating implicit pointer-to-string conversions, so these functions will be needed :)

With append_utf32_unchecked available, String::operator+= can use that instead of the slow append_utf32 function. It should be several times faster than before.

@Ivorforce Ivorforce added this to the 4.x milestone Mar 29, 2025
@Ivorforce Ivorforce requested a review from a team as a code owner March 29, 2025 22:31
@Ivorforce Ivorforce force-pushed the string-encode-complete branch from 2b8807f to e9369fb Compare March 29, 2025 22:44
memcpy(dst, p_char, p_length * sizeof(char32_t));
*(dst + p_length) = _null;
void String::append_utf32_unchecked(const Span<char32_t> &p_span) {
const int prev_length = length();
Copy link
Contributor

@kiroxas kiroxas Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny that you cache length() that is called twice, but not p_span.size() that is called 3 times.

Copy link
Member Author

@Ivorforce Ivorforce Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p_span.size() just compiles to an inlined field access (p_span._ptr). There's no need to cache it :)
Edit: Probably also no need to cache length() either, but prev_length has to be cached because length() changes after the resize.

@Ivorforce Ivorforce force-pushed the string-encode-complete branch from e9369fb to 5ea8a6a Compare May 6, 2025 16:03
@Ivorforce Ivorforce modified the milestones: 4.x, 4.5 May 6, 2025
@Ivorforce
Copy link
Member Author

I found that String::operator+=(String) still used the slow checked append_utf32 function. String instances should already be correct utf32, so we should use append_utf32_unchecked instead, leading to major speed ups (several times faster).

@Ivorforce Ivorforce force-pushed the string-encode-complete branch from 5ea8a6a to be55c43 Compare May 6, 2025 20:49
Comment on lines 273 to 279
// NULL-terminated c string copy - automatically parse the string to find the length.
void append_latin1(const char *p_cstr) {
append_latin1(Span(p_cstr, p_cstr ? strlen(p_cstr) : 0));
}
void append_utf32(const char32_t *p_cstr) {
append_utf32(Span(p_cstr, p_cstr ? strlen(p_cstr) : 0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should get Error return types as well, no? I'd also group these by their Span equivalents below (changing their return type as well) to match the other functions

Copy link
Member Author

@Ivorforce Ivorforce May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are currently in the private section of String. I plan to remove these anyway, when it's possible.
I could still add an Error return type, though I don't think they're called from anywhere except the constructor at this point.

append_utf32((Span<char32_t> &)p_cstr);
#endif
}
void append_wstring(const wchar_t *p_cstr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@Ivorforce Ivorforce requested a review from Repiteo May 14, 2025 09:49
@Ivorforce
Copy link
Member Author

@Repiteo Is it fine to move ahead with this even with the functions you flagged not exposed?

Copy link
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exposing the private functions isn't a dealbreaker. Looks ready to roll after a rebase

@Repiteo Repiteo modified the milestones: 4.5, 4.6 Jun 16, 2025
@Ivorforce Ivorforce force-pushed the string-encode-complete branch from be55c43 to 8023a8a Compare June 16, 2025 16:28
@Repiteo
Copy link
Contributor

Repiteo commented Sep 18, 2025

Needs rebase (again)

…:utf32_unchecked` in `String` for high performance string copies. Expose `append_wstring` and `String::wstring` for platform strings.
@Ivorforce Ivorforce force-pushed the string-encode-complete branch from 8023a8a to d1fd42b Compare September 18, 2025 17:27
@Ivorforce
Copy link
Member Author

I'm a rebase machine 😎

@Repiteo Repiteo merged commit 8eeef16 into godotengine:master Sep 20, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Sep 20, 2025

Thanks!

@Ivorforce Ivorforce deleted the string-encode-complete branch September 20, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

X Tutup