git: libc/regex: Replace old regex library with modified TRE

John Marino marino at crater.dragonflybsd.org
Thu Aug 6 15:15:30 PDT 2015


commit 6af9a77b394698e42f3a7ec6126497a3fc2fd470
Author: John Marino <draco at marino.st>
Date:   Thu Aug 6 23:26:49 2015 +0200

    libc/regex: Replace old regex library with modified TRE
    
    The existing DragonFly REGEX library has several limitations, including
    lack of wide character support and no collation ability due to its being
    locked to POSIX/C locale.  It's also slow and doesn't pass a number of
    tests of the AT&T Research Regex testsuite:
    
       basic       : TEST testregex, 539 tests,  0 errors
       categorize  : TEST testregex,  20 tests,  0 errors
       nullsubexpr : TEST testregex,  84 tests, 31 errors
       leftassoc   : TEST testregex,  12 tests, 12 errors
       rightassoc  : TEST testregex,  24 tests,  0 errors
       forcedassoc : TEST testregex,  48 tests,  8 errors
       repetition  : TEST testregex, 129 tests, 37 errors
    
    Now it achieves these scores (elevated with new regnexec support):
    
       basic       : TEST testregex, 808 tests,  0 errors
       categorize  : TEST testregex,  26 tests,  0 errors
       nullsubexpr : TEST testregex, 172 tests,  0 errors
       leftassoc   : TEST testregex,  12 tests, 12 errors
       rightassoc  : TEST testregex,  36 tests,  0 errors
       forcedassoc : TEST testregex,  84 tests,  0 errors
       repetition  : TEST testregex, 241 tests,  0 errors
    
    Here's proof that the regex library is now locale sensitive:
    
    > env LANG=C sed /abandonn[a-z]/d fwl-sort-C.txt
    a
    abandonnâmes
    abandonnât
    abandonnâtes
    abandonnèrent
    abandonné
    abandonnée
    abandonnées
    abandonnés
    abord
    abords
    absence
    
    > env LANG=fr_FR sed /abandonn[a-z]/d fwl-sort-C.txt
    a
    abord
    abords
    absence
    accepta
    acceptai
    acceptaient
    acceptais
    acceptait
    acceptant
    acceptas
    acceptasse
    
    Several new functions have been added to to libc:
    
      variations of regcomp: regcomp_l,
        regncomp,  regncomp_l,
        regwcomp,  regwcomp_l,
        regnwcomp, regnwcomp_l
    
      variations of regexec: regnexec, regwexec, regwnexec
    
    The regex.3 and re_format.7 map pages have been updated and symlinked
    accordingly.

Summary of changes:
 include/Makefile                          |    2 +-
 include/regex.h                           |  120 --
 lib/libc/Makefile.inc                     |    2 +-
 lib/libc/regex/COPYRIGHT                  |   56 -
 lib/libc/regex/Makefile.inc               |   19 -
 lib/libc/regex/Symbol.map                 |    6 -
 lib/libc/regex/WHATSNEW                   |   94 --
 lib/libc/regex/cname.h                    |  139 ---
 lib/libc/regex/engine.c                   | 1186 -------------------
 lib/libc/regex/regcomp.c                  | 1823 -----------------------------
 lib/libc/regex/regerror.c                 |  170 ---
 lib/libc/regex/regex2.h                   |  193 ---
 lib/libc/regex/regexec.c                  |  229 ----
 lib/libc/regex/regfree.c                  |   85 --
 lib/libc/regex/utils.h                    |   54 -
 lib/libc/tre-regex/Makefile.inc           |   42 +
 lib/libc/tre-regex/Symbol.map             |   23 +
 lib/libc/tre-regex/cname.h                |  140 +++
 lib/libc/tre-regex/config.h               |  259 ++++
 lib/libc/{regex => tre-regex}/re_format.7 |  354 +++++-
 lib/libc/{regex => tre-regex}/regex.3     |  327 +++++-
 lib/libc/tre-regex/regex.h                |  232 ++++
 lib/libc/tre-regex/tre.h                  |    8 +
 23 files changed, 1321 insertions(+), 4242 deletions(-)
 delete mode 100644 include/regex.h
 delete mode 100644 lib/libc/regex/COPYRIGHT
 delete mode 100644 lib/libc/regex/Makefile.inc
 delete mode 100644 lib/libc/regex/Symbol.map
 delete mode 100644 lib/libc/regex/WHATSNEW
 delete mode 100644 lib/libc/regex/cname.h
 delete mode 100644 lib/libc/regex/engine.c
 delete mode 100644 lib/libc/regex/regcomp.c
 delete mode 100644 lib/libc/regex/regerror.c
 delete mode 100644 lib/libc/regex/regex2.h
 delete mode 100644 lib/libc/regex/regexec.c
 delete mode 100644 lib/libc/regex/regfree.c
 delete mode 100644 lib/libc/regex/utils.h
 create mode 100644 lib/libc/tre-regex/Makefile.inc
 create mode 100644 lib/libc/tre-regex/Symbol.map
 create mode 100644 lib/libc/tre-regex/cname.h
 create mode 100644 lib/libc/tre-regex/config.h
 rename lib/libc/{regex => tre-regex}/re_format.7 (56%)
 rename lib/libc/{regex => tre-regex}/regex.3 (69%)
 create mode 100644 lib/libc/tre-regex/regex.h
 create mode 100644 lib/libc/tre-regex/tre.h

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/6af9a77b394698e42f3a7ec6126497a3fc2fd470


-- 
DragonFly BSD source repository



More information about the Commits mailing list