git: Complete upgrade of gnu grep 2.14 => 2.20

John Marino marino at crater.dragonflybsd.org
Fri Oct 10 15:53:39 PDT 2014


commit 51ddd709576b6d603cb35ff07b103a739f875a02
Author: John Marino <draco at marino.st>
Date:   Fri Oct 10 23:33:40 2014 +0200

    Complete upgrade of gnu grep 2.14 => 2.20
    
    ** 2.20 Bug fixes
      grep --max-count=N FILE would no longer stop reading after Nth match.
      I.e., while grep would still print the correct output, it would continue
      reading until end of input, and hence, potentially forever.
      [bug introduced in grep-2.19]
    
      A command like echo aa|grep -E 'a(b$|c$)' would mistakenly
      report the input as a matched line. [bug introduced in grep-2.19]
    
    ** 2.20 Changes in behavior
      grep --exclude-dir='FOO/' now excludes the directory FOO.
      Previously, the trailing slash meant the option was ineffective.
    
    ** 2.19 Improvements
      Performance has improved, typically by 10% and in some cases by a
      factor of 200.  However, performance of grep -P in UTF-8 locales has
      gotten worse as part of the fix for the crashes mentioned below.
    
    ** 2.19 Bug fixes
      grep no longer mishandles patterns like [a-[.z.]], and no longer
      mishandles patterns like [^a] in locales that have multicharacter
      collating sequences so that [^a] can match a string of two characters.
    
      grep no longer mishandles an empty pattern at the end of a pattern list.
      [bug introduced in grep-2.5]
    
      grep -C NUM now outputs separators consistently even when NUM is zero,
      and similarly for grep -A NUM and grep -B NUM.
      [bug present since "the beginning"]
    
      grep -f no longer mishandles patterns containing NUL bytes.
      [bug introduced in grep-2.11]
    
      Plain grep, grep -E, and grep -F now treat encoding errors in patterns
      the same way the GNU regular expression matcher treats them, with respect
      to whether the errors can match parts of multibyte characters in data.
      [bug present since "the beginning"]
    
      grep -w no longer mishandles a potential match adjacent to a letter that
      takes up two or more bytes in a multibyte encoding.
      Similarly, the patterns '\<', '\>', '\b', and '\B' no longer
      mishandle word-boundary matches in multibyte locales.
      [bug present since "the beginning"]
    
      grep -P now reports an error and exits when given invalid UTF-8 data.
      Previously it was unreliable, and sometimes crashed or looped.
      [bug introduced in grep-2.16]
    
      grep -P now works with -w and -x and backreferences. Before,
      echo aa|grep -Pw '(.)\1' would fail to match, yet
      echo aa|grep -Pw '(.)\2' would match.
    
      grep -Pw now works like grep -w in that the matched string has to be
      preceded and followed by non-word components or the beginning and end
      of the line (as opposed to word boundaries before).  Before, this
      echo a@@a| grep -Pw @@ would match, yet this
      echo a@@a| grep -w @@ would not.  Now, they both fail to match,
      per the documentation on how grep's -w works.
    
      grep -i no longer mishandles patterns containing titlecase characters.
      For example, in a locale containing the titlecase character
      'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
      'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ)
      and 'lj' (U+01C9 LATIN SMALL LETTER LJ).
    
    ** 2.18 Bug fixes
      grep no longer mishandles patterns like [^^-~] in unibyte locales.
      [bug introduced in grep-2.8]
    
      grep -i in a multibyte, non-UTF8 locale could be up to 200 times slower
      than in 2.16.  [bug introduced in grep-2.17]
    
    ** 2.17 Improvements
      grep -i in a multibyte locale is now typically 10 times faster
      for patterns that do not contain \ or [.
    
      grep (without -i) in a multibyte locale is now up to 7 times faster
      when processing many matched lines.
    
    ** 2.16 Bug fixes
      The fix to make \s and \S work with multi-byte white space broke
      the use of each shortcut whenever followed by a repetition operator.
      For example, \s*, \s+, \s? and \s{3} would all malfunction in a
      multi-byte locale.  [bug introduced in grep-2.15]
    
      The fix to make grep -P work better with UTF-8 made it possible for
      grep to evoke a larger set of PCRE errors, some of which could trigger
      an abort.  E.g., this would abort:
        printf '\x82'|LC_ALL=en_US.UTF-8 grep -P y
      Now grep handles arbitrary PCRE errors.  [bug introduced in grep-2.15]
    
      Handle very long lines (2GiB and longer) on systems with a deficient
      read system call.
    
    ** 2.15 Bug fixes
      grep's \s and \S failed to work with multi-byte white space characters.
      For example, \s would fail to match a non-breaking space, and this
      would print nothing: printf '\xc2\xa0' | LC_ALL=en_US.UTF-8 grep '\s'
      A related bug is that \S would mistakenly match an invalid multibyte
      character.  For example, the following would match:
        printf '\x82\n' | LC_ALL=en_US.UTF-8 grep '^\S$'
      [bug present since grep-2.6]
    
      grep -i would segfault on systems using UTF-16-based wchar_t (Cygwin)
      when converting an input string containing certain 4-byte UTF-8
      sequences to lower case.  The conversions to wchar_t and back to
      a UTF-8 multibyte string did not take surrogate pairs into account.
      [bug present since at least grep-2.6, though the segfault is new with 2.13]
    
      grep -E would segfault when given a regexp like '([^.]*[M]){1,2}'
      for any multibyte character M. [bug introduced in grep-2.6, which would
      segfault, but 2.7 and 2.8 had no problem, and 2.9 through 2.14 would
      hit a failed assertion. ]
    
      grep -F would get stuck in an infinite loop when given a search string
      that is an invalid byte sequence in the current locale and that matches
      the bytes of the input twice on a line.  Now grep fails with exit status 1.
    
      grep -P could misbehave.  While multi-byte mode is only supported by PCRE
      with UTF-8 locales, grep did not activate it.  This would cause failures
      to match multibyte characters against some regular expressions, especially
      those including the '.' or '\p' metacharacters.
    
    ** 2.15 New features
      grep -P can now use a just-in-time compiler to greatly speed up matches,
      This feature is transparent to the user; no flag is required to enable
      it.  It is only available if the corresponding support in the PCRE
      library is detected when grep is compiled.

Summary of changes:
 contrib/grep/README.DELETED                        |    4 +-
 contrib/grep/README.DRAGONFLY                      |   14 +-
 gnu/usr.bin/grep/Makefile                          |    2 +-
 gnu/usr.bin/grep/Makefile.inc0                     |   14 -
 gnu/usr.bin/grep/egrep/Makefile                    |    8 +-
 gnu/usr.bin/grep/egrep/egrep                       |   11 +
 gnu/usr.bin/grep/fgrep/Makefile                    |    8 +-
 gnu/usr.bin/grep/fgrep/fgrep                       |   11 +
 gnu/usr.bin/grep/grep/Makefile                     |   21 +-
 gnu/usr.bin/grep/grep/grep.1                       |   38 +-
 gnu/usr.bin/grep/libgrep/Makefile                  |   21 -
 gnu/usr.bin/grep/libgreputils/Makefile             |   22 +-
 gnu/usr.bin/grep/libgreputils/alloca.h             |    2 +-
 gnu/usr.bin/grep/libgreputils/config.h             |  265 ++-
 gnu/usr.bin/grep/libgreputils/configmake.h         |    4 +-
 gnu/usr.bin/grep/libgreputils/{fcntl.h => ctype.h} |  342 +---
 .../grep/libgreputils/{fcntl.h => dirent.h}        |  437 ++---
 gnu/usr.bin/grep/libgreputils/fcntl.h              |   18 +-
 gnu/usr.bin/grep/libgreputils/getopt.h             |    4 +-
 gnu/usr.bin/grep/libgreputils/{fcntl.h => iconv.h} |  341 +---
 gnu/usr.bin/grep/libgreputils/inttypes.h           | 1452 +++++++++++++++
 .../grep/libgreputils/{fcntl.h => langinfo.h}      |  431 ++---
 .../grep/libgreputils/{fcntl.h => locale.h}        |  423 ++---
 gnu/usr.bin/grep/libgreputils/stdio.h              | 1665 +++++++++++++++++
 gnu/usr.bin/grep/libgreputils/stdlib.h             | 1276 +++++++++++++
 gnu/usr.bin/grep/libgreputils/string.h             | 1341 ++++++++++++++
 gnu/usr.bin/grep/libgreputils/sys/stat.h           |    8 +-
 .../grep/libgreputils/{fcntl.h => sys/time.h}      |  438 ++---
 gnu/usr.bin/grep/libgreputils/sys/types.h          |   54 +
 gnu/usr.bin/grep/libgreputils/unistd.h             | 1869 ++++++++++++++++++++
 gnu/usr.bin/grep/libgreputils/unistr.h             |    2 +-
 gnu/usr.bin/grep/libgreputils/unitypes.h           |    2 +-
 gnu/usr.bin/grep/libgreputils/uniwidth.h           |    2 +-
 gnu/usr.bin/grep/libgreputils/wchar.h              | 1340 ++++++++++++++
 .../grep/libgreputils/{fcntl.h => wctype.h}        |  733 +++++---
 35 files changed, 10455 insertions(+), 2168 deletions(-)
 delete mode 100644 gnu/usr.bin/grep/Makefile.inc0
 create mode 100755 gnu/usr.bin/grep/egrep/egrep
 create mode 100755 gnu/usr.bin/grep/fgrep/fgrep
 delete mode 100644 gnu/usr.bin/grep/libgrep/Makefile
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => ctype.h} (60%)
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => dirent.h} (66%)
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => iconv.h} (63%)
 create mode 100644 gnu/usr.bin/grep/libgreputils/inttypes.h
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => langinfo.h} (61%)
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => locale.h} (65%)
 create mode 100644 gnu/usr.bin/grep/libgreputils/stdio.h
 create mode 100644 gnu/usr.bin/grep/libgreputils/stdlib.h
 create mode 100644 gnu/usr.bin/grep/libgreputils/string.h
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => sys/time.h} (64%)
 create mode 100644 gnu/usr.bin/grep/libgreputils/sys/types.h
 create mode 100644 gnu/usr.bin/grep/libgreputils/unistd.h
 create mode 100644 gnu/usr.bin/grep/libgreputils/wchar.h
 copy gnu/usr.bin/grep/libgreputils/{fcntl.h => wctype.h} (55%)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/51ddd709576b6d603cb35ff07b103a739f875a02


-- 
DragonFly BSD source repository



More information about the Commits mailing list