Patch for i386 to use bsrl for libc fls().
A patch to libc to use bsr for ffs.
#2 Updated by TGEN over 4 years ago
On 05/25/2010 16:15, Venkatesh Srinivas (via DragonFly issue tracker) wrote:
> Also, let's pretend I didn't make a silly mistake.
> The jz 1 and label 1: should become jz L1 and L1:
> Jumping to the address 0x1 is silly.
You can keep the label as 1: and use jz 1f instead (the f is for
'forward', as opposed to b, 'backward').
#3 Updated by vsrinivas over 4 years ago
This is an improved version of the patch that corrects my silly jz mistake and
adds ffsl and flsl. It also implements both for x86_64. The x86-64 version is
branchless, using cmov.
This alternate snippet may be better and would allow for a branchless f?f(l) on
i386, except bsr and bsf are defined differently on Intel and AMD x86 machines.
It may be better on x86_64 than the cmov-based version:
xorl %eax, %eax
xorl %ecx, %ecx
bsrl 4(%esp), %eax
addl %ecx, %eax
#4 Updated by vsrinivas about 4 years ago
I just tested the i386 fls against its C equivalent on every input, to see that
it produces the same output; it passed. Running the C version, optimized by
gcc, took 49 sec on my hardware, the assembler version 10sec.
The x86_64 version is tested on a few test inputs, but not in the Dfly libc,
only as standalone .S files.