Locale & wide character strangeness

David Adam zanchey at ucc.gu.uwa.edu.au
Mon May 30 11:29:57 PDT 2016


Hi Romick,

FWIW, iswalnum() returns zero for those special characters as they are in 
the private use area of Unicode characters. My colleagues have tested this 
across multiple platforms.

So, I don't think that's the problem.

Thanks for the suggestion though!

David Adam
zanchey at ucc.gu.uwa.edu.au

On Sat, 28 May 2016, Romick wrote:
> On Thu, May 26, 2016 at 06:58:12AM +0800, David Adam wrote:
> > Hello,
> > 
> > I help work on fish-shell, an alternative command-line shell. Recently 
> > we've had some reports of strange behaviour on newer versions of DragonFly 
> > BSD, which as far as I can tell come down to unusual behaviours of wide 
> > character functions in the UTF-8 locale.
> 
> I looked at the parser in fish-shell, you use special characters directly
> in the input stream to mark different things, such as BRACKET_BEGIN,
> BRACKET_END, BRACKET_SEP, INTERNAL_SEPARATOR and so on.
> 
> This is fine until you have met the locale in which the characters are
> full members of the alphabet.
> You see, Unicode range is 0x0 to 0x10FFFF, and character
> INTERNAL_SEPARATOR has a code of 0xFDD7.  
> 
> In DragonFly BSD function iswalnum() checks all locales simultaneously, so
> that you have three choices:
> 1) use your own iswalnum():
> ===
> diff --git a/src/common.h b/src/common.h
> index e59dfc0..e8c01c3 100644
> --- a/src/common.h
> +++ b/src/common.h
> @@ -769,4 +769,8 @@ __attribute__((noinline)) void debug_thread_error(void);
>  /// specified base, return -1.
>  long convert_digit(wchar_t d, int base);
>  
> +inline int iswalnum(wchar_t chr) {
> +	return((chr >= L'a' && chr <= L'z') || (chr >= L'A' && chr <= L'Z') || iswdigit(chr));
> +}
> +
>  #endif
> ===
> 
> 2) use bigger values for your special characters (I have not tested this).
> 
> 3) something else:)




More information about the Users mailing list