UTF8 locale MFC for DragonflyBSD
Matthew Dillon
dillon at apollo.backplane.com
Mon Mar 29 08:47:32 PST 2004
:> I don't really want to force us to UCS2, just because MS did. It is pretty
:> pointless if you think about Unicode as mean to encode every _written_
:> script in the world. Therefore if we want to apply any length checks, the
:> correct way is as specified by at least Unicode 3 e.g. UCS4.
:
:Well, not just MS; a lot of folks (notably Sun/Java) were caught off
:guard when Unicode was extended beyond the base 64k characters. I won't
:replicate the flame wars here, they're all on Google. :-)
:
:My personal opinion: UCS-4 wastes a lot of space given that Unicode 3.1
:is a ~21-bit set and nobody is really using the >=U+10000 space in a
:practical manner (yet?). But if you need to have a one-to-one mapping,
:you don't have much choice.
:
:Unless you have a machine which uses 21-bit bytes, of course. ;-)
UTF8 is the way we should go. I severely dislike the wasted space as
well, and it's a mistake to try to use a direct-encoding representation
when most programs already deal in 'strings' rather then 'characters'
for most things.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Submit
mailing list