Bug #1770

Patch for i386 to use bsrl for libc fls().

Added by vsrinivas over 4 years ago. Updated over 4 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi,

A patch to libc to use bsr for ffs.

Thanks,
-- vs

0001-libc-Add-i386-version-of-fls-using-bsr.patch Magnifier (3.06 KB) vsrinivas, 05/24/2010 03:41 AM

0001-libc-Add-assembler-ffs-fls-ffsl-flsl-to-i386-and.patch Magnifier (12.8 KB) vsrinivas, 05/26/2010 05:14 AM

0001-libc-Add-assembler-ffs-fls-ffsl-flsl-to-i386.patch Magnifier (12.8 KB) vsrinivas, 06/04/2010 04:38 AM

0001-libc-Add-assembler-ffs-fls-ffsl-flsl.patch Magnifier (12.9 KB) vsrinivas, 06/04/2010 07:01 PM

History

#1 Updated by vsrinivas over 4 years ago

Also, let's pretend I didn't make a silly mistake.

The jz 1 and label 1: should become jz L1 and L1:

Jumping to the address 0x1 is silly.

#2 Updated by TGEN over 4 years ago

On 05/25/2010 16:15, Venkatesh Srinivas (via DragonFly issue tracker) wrote:
> Also, let's pretend I didn't make a silly mistake.
> The jz 1 and label 1: should become jz L1 and L1:
>
> Jumping to the address 0x1 is silly.

You can keep the label as 1: and use jz 1f instead (the f is for
'forward', as opposed to b, 'backward').

Cheers,
--
Thomas E. Spanjaard

#3 Updated by vsrinivas over 4 years ago

Hi,

This is an improved version of the patch that corrects my silly jz mistake and
adds ffsl and flsl. It also implements both for x86_64. The x86-64 version is
branchless, using cmov.

This alternate snippet may be better and would allow for a branchless f?f(l) on
i386, except bsr and bsf are defined differently on Intel and AMD x86 machines.
It may be better on x86_64 than the cmov-based version:
xorl %eax, %eax
xorl %ecx, %ecx
bsrl 4(%esp), %eax
setnz %cl
addl %ecx, %eax
ret

Thanks,
-- vs

#4 Updated by vsrinivas over 4 years ago

I just tested the i386 fls against its C equivalent on every input, to see that
it produces the same output; it passed. Running the C version, optimized by
gcc, took 49 sec on my hardware, the assembler version 10sec.

The x86_64 version is tested on a few test inputs, but not in the Dfly libc,
only as standalone .S files.

#5 Updated by vsrinivas over 4 years ago

Correct a whilespace issue when patch is used.

#6 Updated by vsrinivas over 4 years ago

x86_64 doesn't have DEFS.h.

#7 Updated by dillon over 4 years ago

:
:Venkatesh Srinivas <> added the comment:
:
:x86_64 doesn't have DEFS.h.

Vekatesh, could you roll up all these patches and throw them into
you git repo? It's easier to bring them in and test them that way.

-Matt

#8 Updated by vsrinivas over 4 years ago

Okay - its in my git tree on leaf, branch libc-asm.

-- vs

#9 Updated by vsrinivas over 4 years ago

Resolved by commit 6cb1335d825070c7c4f5f0267d2b4c4b7dfd86be.

Also available in: Atom PDF