UTF8 locale MFC for DragonflyBSD

Matthew Dillon dillon at apollo.backplane.com
Mon Mar 29 08:47:32 PST 2004


:> I don't really want to force us to UCS2, just because MS did. It is pretty
:> pointless if you think about Unicode as mean to encode every _written_
:> script in the world. Therefore if we want to apply any length checks, the
:> correct way is as specified by at least Unicode 3 e.g. UCS4.
:
:Well, not just MS; a lot of folks (notably Sun/Java) were caught off 
:guard when Unicode was extended beyond the base 64k characters.  I won't 
:replicate the flame wars here, they're all on Google. :-)
:
:My personal opinion: UCS-4 wastes a lot of space given that Unicode 3.1 
:is a ~21-bit set and nobody is really using the >=U+10000 space in a 
:practical manner (yet?).  But if you need to have a one-to-one mapping, 
:you don't have much choice.
:
:Unless you have a machine which uses 21-bit bytes, of course. ;-)

    UTF8 is the way we should go.  I severely dislike the wasted space as
    well, and it's a mistake to try to use a direct-encoding representation
    when most programs already deal in 'strings' rather then 'characters'
    for most things.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Submit mailing list