UTF8 string normalization

Hi,

during the development of a plug-in internal preset management system, I recognized the problem that strings received from text edits have a different UTF8 normalization. For example, the character “ä” on Mac is encoded as 0x61CC88 and on Windows as 0xC3A4. So, on Mac, VSTGUI prefers to work with decomposed umlaut characters, however on Windows the precomposed characters are used. This becomes a problem as soon as you wish to store strings persistently in a platform independent way. What I did to overcome this issue, is to modify the function CocoaTextEdit::getText() by returning the precomposedStringWithCanonicalMapping instead of the decomposedStringWithCanonicalMapping. This resolves the issue and yields a platform independent unique UTF8 string handling.

My question now is, what is the reason, why VSTGUI always returns the decomposedStringWithCanonicalMapping? Is there any reason, why this odd behaviour is necessary?

Regards,

Joscha

Hi Joscha,
it’s not odd behaviour. It’s the same normalization the filesystem on macOS is using. I think the bug I fixed when I changed this to the current form was where a user pasted in some text for a filename into a text field which was precomposed and used to create a new file, trying to open the file later with this precomposed string failed.
What exact issue did you have when storing the string ?

Cheers,
Arne

Hi Arne,

okay, this is an interesting fact. Thanks for the info, I really did not have any idea why one would like to explicitely use decomposed UTF-8 strings. We encountered a problem with our new plug-in internal preset management system, we currently develop. The problem was that when a user created a preset with some meta information, for example Author: “Andreas Schröder”, on a Windows system and afterwards created some new presets on Mac also with the same author name, the preset management system was unable to match the string “Andreas Schröder” created on the Windows system with the same string on Mac. So, finally, the preset management system handled both versions of “Andreas Schröder” like two different authors. This was only the case, because the ö was precomposed on Windows and decomposed on Mac. But we need a platform independent handling of UTF-8 strings to ensure that typing the same word always yields the same encoding, which is now given since we switched to precomposed strings on Mac.

I hope this clarifies our issue.


Regards,

Joscha

Hi Arne,

I checked copying a string from finder to our modified VSTGUI text edit that uses precompiled UTF-8 strings and it works perfectly fine here. I am using Mojave on an APFS formatted drive.

Regards,

Joscha