This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
graphic
3 years ago, I wrote an article about Unicode history (Unicode itself and .NET characters) in Japanese. Diagrams/illustrations in the article are drawn by using PowerPoint. I hope this pptx helps you.
I think this is a very good write-up. There's one aspect that I disagree with however, and that's the recommendation to use char instead when you're sure that the character will be representable as a single UTF-16 code unit. I think this is an unnecessary complication to the mental model, and also makes it harder to switch the backing encoding of a string (say, to a Utf8String) without breaking code. I think that going forward, it makes more sense to avoid treating char as an entire character, even when it is known to be. When searching for a character in a string, users shouldn't have to look up whether or not that character is in the BMP when it is simpler to just use Rune.
I'm also trying to work an "aBOMination" pun in here somewhere, but as of yet to no avail.