Locale & wide character strangeness

Romick yellowrabbit2010 at gmail.com
Sat May 28 02:56:23 PDT 2016


On Thu, May 26, 2016 at 06:58:12AM +0800, David Adam wrote:
> Hello,
> 
> I help work on fish-shell, an alternative command-line shell. Recently 
> we've had some reports of strange behaviour on newer versions of DragonFly 
> BSD, which as far as I can tell come down to unusual behaviours of wide 
> character functions in the UTF-8 locale.

Hello,
I looked at the parser in fish-shell, you use special characters directly
in the input stream to mark different things, such as BRACKET_BEGIN,
BRACKET_END, BRACKET_SEP, INTERNAL_SEPARATOR and so on.

This is fine until you have met the locale in which the characters are
full members of the alphabet.
You see, Unicode range is 0x0 to 0x10FFFF, and character
INTERNAL_SEPARATOR has a code of 0xFDD7.  

In DragonFly BSD function iswalnum() checks all locales simultaneously, so
that you have three choices:
1) use your own iswalnum():
===
diff --git a/src/common.h b/src/common.h
index e59dfc0..e8c01c3 100644
--- a/src/common.h
+++ b/src/common.h
@@ -769,4 +769,8 @@ __attribute__((noinline)) void debug_thread_error(void);
 /// specified base, return -1.
 long convert_digit(wchar_t d, int base);
 
+inline int iswalnum(wchar_t chr) {
+	return((chr >= L'a' && chr <= L'z') || (chr >= L'A' && chr <= L'Z') || iswdigit(chr));
+}
+
 #endif
===

2) use bigger values for your special characters (I have not tested this).

3) something else:)

   Good luck
    
-- 
  with best reagrds, Yellow Rabbit
  DragonFly 4.5-DEVELOPMENT x86_64



More information about the Users mailing list