Locale & wide character strangeness

Romick yellowrabbit2010 at gmail.com
Sat May 28 02:56:23 PDT 2016

On Thu, May 26, 2016 at 06:58:12AM +0800, David Adam wrote:
> Hello,
> I help work on fish-shell, an alternative command-line shell. Recently 
> we've had some reports of strange behaviour on newer versions of DragonFly 
> BSD, which as far as I can tell come down to unusual behaviours of wide 
> character functions in the UTF-8 locale.

I looked at the parser in fish-shell, you use special characters directly
in the input stream to mark different things, such as BRACKET_BEGIN,

This is fine until you have met the locale in which the characters are
full members of the alphabet.
You see, Unicode range is 0x0 to 0x10FFFF, and character
INTERNAL_SEPARATOR has a code of 0xFDD7.  

In DragonFly BSD function iswalnum() checks all locales simultaneously, so
that you have three choices:
1) use your own iswalnum():
diff --git a/src/common.h b/src/common.h
index e59dfc0..e8c01c3 100644
--- a/src/common.h
+++ b/src/common.h
@@ -769,4 +769,8 @@ __attribute__((noinline)) void debug_thread_error(void);
 /// specified base, return -1.
 long convert_digit(wchar_t d, int base);
+inline int iswalnum(wchar_t chr) {
+	return((chr >= L'a' && chr <= L'z') || (chr >= L'A' && chr <= L'Z') || iswdigit(chr));

2) use bigger values for your special characters (I have not tested this).

3) something else:)

   Good luck
  with best reagrds, Yellow Rabbit
  DragonFly 4.5-DEVELOPMENT x86_64

