Bug #2824
closedNew higher speed CRC code
0%
Description
Dear DragonFlyBSD bugs,
This isn't really a bug. I noticed there is the possibility of improving
the performance of the recently committed new CRC code ("fast iscsi crc
code").
In the following function:
sys/libkern/icrc32.c
<http://gitweb.dragonflybsd.org/dragonfly.git/blob/d557434b1f5510b6fed895379af444f0d034c07b:/sys/libkern/icrc32.c>
static uint32_t
singletable_crc32c(uint32_t crc, const void *buf, size_t size)
{
const uint8_t *p = buf;
while (size--)
crc = crc32Table[(crc ^ *p++) & 0xff] ^ (crc >> 8);
return crc;
}
The two separate operations of "size--" and "*p++" could be combined into
one operation. The way that I would do that would be something like:
...
size_t I;
for (i = 0; i < size; ++i) {
crc = crc32Table[(crc ^ p[i]) & 0xff] ^ (crc >> 8);
}
...
So you would be saving one operation; performance improvement.
I haven't looked at the rest of the code, so perhaps there are other
performance improvements that could be had.
Hope this helps ...
--
Sincerely,
Robin Carey BSc
Updated by alexh over 9 years ago
It doesn't save any operation/instruction with an optimizing compiler.
Even though it should be obvious, just to back it up with some real generated code, here go the critical loops of both versions (compiled with gcc -O3). The only difference is a 1-byte saving on the encoding of the xor. No real savings, and really no point in "optimizing" like that. The compiler does a better job :)
10: 48 83 c6 01 add $0x1,%rsi
14: 89 c1 mov %eax,%ecx
16: c1 e8 08 shr $0x8,%eax
19: 32 4e ff xor -0x1(%rsi),%cl
1c: 0f b6 c9 movzbl %cl,%ecx
1f: 33 04 8d 00 00 00 00 xor 0x0(,%rcx,4),%eax
26: 48 39 d6 cmp %rdx,%rsi
29: 75 e5 jne 10 <singletable_crc32c+0x10>
40: 89 c1 mov %eax,%ecx
42: 32 0e xor (%rsi),%cl
44: 48 83 c6 01 add $0x1,%rsi
48: c1 e8 08 shr $0x8,%eax
4b: 0f b6 c9 movzbl %cl,%ecx
4e: 33 04 8d 00 00 00 00 xor 0x0(,%rcx,4),%eax
55: 48 39 d6 cmp %rdx,%rsi
58: 75 e6 jne 40 <singletable_crc32c_carey+0x10>