[DragonFlyBSD - Bug #2824] New higher speed CRC code

Tue Jun 9 06:04:55 PDT 2015

Issue #2824 has been updated by alexh.

It doesn't save any operation/instruction with an optimizing compiler.

Even though it should be obvious, just to back it up with some real generated code, here go the critical loops of both versions (compiled with gcc -O3). The only difference is a 1-byte saving on the encoding of the xor. No real savings, and really no point in "optimizing" like that. The compiler does a better job :)

  10:   48 83 c6 01             add    $0x1,%rsi
  14:   89 c1                   mov    %eax,%ecx
  16:   c1 e8 08                shr    $0x8,%eax
  19:   32 4e ff                xor    -0x1(%rsi),%cl
  1c:   0f b6 c9                movzbl %cl,%ecx
  1f:   33 04 8d 00 00 00 00    xor    0x0(,%rcx,4),%eax
  26:   48 39 d6                cmp    %rdx,%rsi
  29:   75 e5                   jne    10 <singletable_crc32c+0x10>

  40:   89 c1                   mov    %eax,%ecx
  42:   32 0e                   xor    (%rsi),%cl
  44:   48 83 c6 01             add    $0x1,%rsi
  48:   c1 e8 08                shr    $0x8,%eax
  4b:   0f b6 c9                movzbl %cl,%ecx
  4e:   33 04 8d 00 00 00 00    xor    0x0(,%rcx,4),%eax
  55:   48 39 d6                cmp    %rdx,%rsi
  58:   75 e6                   jne    40 <singletable_crc32c_carey+0x10>

----------------------------------------
Bug #2824: New higher speed CRC code
http://bugs.dragonflybsd.org/issues/2824#change-12667

* Author: robin.carey1
* Status: New
* Priority: Normal
* Assignee: 
* Category: 
* Target version: 
----------------------------------------
Dear DragonFlyBSD bugs,

This isn't really a bug. I noticed there is the possibility of improving
the performance of the recently committed new CRC code ("fast iscsi crc
code").

In the following function:

sys/libkern/icrc32.c
<http://gitweb.dragonflybsd.org/dragonfly.git/blob/d557434b1f5510b6fed895379af444f0d034c07b:/sys/libkern/icrc32.c>

static uint32_t
singletable_crc32c(uint32_t crc, const void *buf, size_t size)
{
       const uint8_t *p = buf;

       while (size--)
               crc = crc32Table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

       return crc;
}

The two separate operations of "size--" and "*p++" could be combined into
one operation. The way that I would do that would be something like:

...
size_t I;
for (i = 0; i < size; ++i) {
  crc = crc32Table[(crc ^ p[i]) & 0xff] ^ (crc >> 8);
}
...

So you would be saving one operation; performance improvement.

I haven't looked at the rest of the code, so perhaps there are other
performance improvements that could be had.

Hope this helps ...

-- 
Sincerely,

Robin Carey BSc

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account