Bug #1525

boehm-gc problems

Added by hasso about 5 years ago. Updated about 2 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Although devel/boehm-gc builds on DragonFly, there are problems - its own
testuite doesn't run. The test for garbage collecting functionality either
segfaults or hangs and if boehm-gc is built with threading support (making it
treat DragonFly as FreeBSD), threading test does the same.

Note, that boehm-gc is used by in many software pieces (mainly various
programming languages) and is probably onw of reasons why so many languages
actually fail to run on DragonFly.

How to test:

fetch http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc-7.1.tar.gz
tar zxf gc-7.1.tar.gz
cd gc-7.1
./configure --disable-threads
gmake
gmake check

hackish-threaded.patch Magnifier (7.94 KB) c.turner1, 10/13/2012 02:47 PM

History

#1 Updated by jgordeev about 5 years ago

I'll summarize what I've learned so far.
Boehm GC is a mark-sweep garbage collector.
The crashes and the hangs happen because the GC garbage collects its own
internal data structures.
How could this happen?! Easy.
The mark procedure starts marking objects as reachable beginning from a
number of root regions, which include, but are not limited to, the
program's global data area and the global data areas of dynamically
loaded libraries. Unfortunately, on DragonFly, the latter part doesn't
work as intended. So, the garbage collector, which itself is a
dynamically loaded library, doesn't mark its own data structures, and
the sweeper frees the memory they occupy.
In the source code of the GC, functions of interest are
GC_add_roots_inner() in mark_rts.c and GC_FirstDLOpenedLinkMap(),
starting at line 493, in dyn_load.c.
GC_add_roots_inner() should get called once for the main executable and
for each dynamic library loaded.
In GC_FirstDLOpenedLinkMap() we start from _DYNAMIC and in a loop try to
find a Elf_Dyn entry with tag DT_DEBUG.
On DragonFly, the main executable has such an entry, however when
GC_FirstDLOpenedLinkMap() iterates over the list it seems to iterate
over the list of entries of libgc - a dynamic library which lacks
DT_DEBUG entries in its .dynamic section. The DT_DEBUG entry we are
looking for isn't found.
On FreeBSD, when we iterate over the list, we get a list of tags that is
a prefix of the tags of the main executable.
The tags on the main executable:
1 1 1 15 12 13 4 5 6 10 11 21 20 23 17 18 19 0x6ffffffe 0x6fffffff
0x6ffffff0
What we get:
1 1 1 15 12 13 4 5 6 10 11 21

I'm leaving this problem report for now, to let my head recover.
If you've got any ideas, please let me know.

Here's some less important stuff:
And now, to see how well FreeBSD fares. Building the Boehm GC on FreeBSD
7.1 and running 'gmake check' produces:
Boehm GC version 6.8 built from ports:
FAIL: gctest
PASS: test_cpp
==================================
1 of 2 tests failed
Please report to
==================================

Boehm GC version 6.8 built manually:
PASS: gctest
==================
All 1 tests passed
==================

Boehm GC version 7.1 built manually:
PASS: gctest
PASS: leaktest
PASS: middletest
PASS: smashtest
PASS: hugetest
PASS: threadleaktest
==================
All 6 tests passed
==================

#2 Updated by jgordeev about 5 years ago

Oh, it's obvious why on FreeBSD we get only a prefix of the list. The
search simply stops when we find a DT_DEBUG entry (code 21).
So, all the focus should be on what _DYNAMIC should point to.

#3 Updated by corecode about 5 years ago

Jordan Gordeev wrote:
> Jordan Gordeev wrote:
>> On FreeBSD, when we iterate over the list, we get a list of tags that
>> is a prefix of the tags of the main executable.
>> The tags on the main executable:
>> 1 1 1 15 12 13 4 5 6 10 11 21 20 23 17 18 19 0x6ffffffe 0x6fffffff
>> 0x6ffffff0
>> What we get:
>> 1 1 1 15 12 13 4 5 6 10 11 21
> Oh, it's obvious why on FreeBSD we get only a prefix of the list. The
> search simply stops when we find a DT_DEBUG entry (code 21).
> So, all the focus should be on what _DYNAMIC should point to.

kudos, good detective work!

#4 Updated by jgordeev about 5 years ago

There's a difference between DragonFly's ld and FreeBSD's ld.
Let's take the following program:
extern void *vodka;

void *
f(void)
{
return vodka;
}

We compile it into a shared library:
%cc -c -o vodka.o vodka.c
%ld -Bshareable -o libvodka.so vodka.o

Then we do:
%objdump -t libvodka.so

Doing the above steps on DragonFly produces:

libvodka.so: file format elf32-i386

SYMBOL TABLE:
00000094 l d .hash 00000000 .hash
000000c4 l d .dynsym 00000000 .dynsym
00000134 l d .dynstr 00000000 .dynstr
00000158 l d .rel.dyn 00000000 .rel.dyn
00000160 l d .text 00000000 .text
0000116c l d .dynamic 00000000 .dynamic
000011e4 l d .got.plt 00000000 .got.plt
00000000 l d .comment 00000000 .comment
00000000 l d *ABS* 00000000 .shstrtab
00000000 l d *ABS* 00000000 .symtab
00000000 l d *ABS* 00000000 .strtab
00000000 l df *ABS* 00000000 vodka.c
0000116c l O *ABS* 00000000 .hidden _DYNAMIC
000011e4 l O *ABS* 00000000 .hidden _GLOBAL_OFFSET_TABLE_
00000160 g F .text 0000000a f
00000000 *UND* 00000000 vodka
000011f0 g *ABS* 00000000 __bss_start
000011f0 g *ABS* 00000000 _edata
000011f0 g *ABS* 00000000 _end

Doing the steps on FreeBSD produces:

libvodka.so: file format elf32-i386-freebsd

SYMBOL TABLE:
00000094 l d .hash 00000000
00000128 l d .dynsym 00000000
00000248 l d .dynstr 00000000
00000288 l d .rel.dyn 00000000
00000290 l d .text 00000000
0000129c l d .data 00000000
0000129c l d .dynamic 00000000
00001314 l d .got 00000000
00001320 l d .bss 00000000
00000000 l d .comment 00000000
00000000 l d *ABS* 00000000
00000000 l d *ABS* 00000000
00000000 l d *ABS* 00000000
00000000 l df *ABS* 00000000 vodka.c
0000129c g O *ABS* 00000000 _DYNAMIC
00000290 g F .text 0000000a f
00000000 *UND* 00000000 vodka
00001320 g *ABS* 00000000 __bss_start
00001320 g *ABS* 00000000 _edata
00001314 g O *ABS* 00000000 _GLOBAL_OFFSET_TABLE_
00001320 g *ABS* 00000000 _end

Please, note that '_DYNAMIC' is a local symbol on DragonFly and a global
one on FreeBSD.

#5 Updated by corecode about 5 years ago

Jordan Gordeev wrote:
> Please, note that '_DYNAMIC' is a local symbol on DragonFly and a global
> one on FreeBSD.

I believe that's because FreeBSD uses an older linker. With more recent
linkers, it is the same as in DragonFly (I checked Linux x86_64).

I also checked, and &_DYNAMIC is different in a shared lib and in the
main program, on both DragonFly and Linux, so that also shouldn't be the
underlying issue.

cheers
simo

#6 Updated by alexh about 3 years ago

Did anyone else ever look into this?

#7 Updated by c.turner1 about 2 years ago

This hacky patch fixes the non-threaded test on i386 running recent current (still gcc44)
- basically, it looks like the #ifdef's haven't kept up with the dynamic linker / elf toolkit,
so alot of the byte / symbol mangling isn't needed and GC can use fancier builtin API's to
programatically access the stuff.

I'll clean this up later and take a crack at the threaded version.

Cheers,

- Chris

# diff -urw dyn_load.c.orig dyn_load.c
--- dyn_load.c.orig 2012-08-09 20:25:13.000000000 +0000
+++ dyn_load.c 2012-10-11 17:53:50.000000000 +0000
@@ -83,6 +83,12 @@
# define ELFSIZE ARCH_ELFSIZE
#endif

+#if defined(__DragonFly__)
+# include <elf.h>
+# include <dlfcn.h>
+# include <link.h>
+#endif
+
#if defined(SCO_ELF) || defined(DGUX) || defined(HURD) \
|| (defined(__ELF__) && (defined(LINUX) || defined(FREEBSD) \
|| defined(NETBSD) || defined(OPENBSD)))
@@ -398,7 +404,7 @@
# pragma weak dl_iterate_phdr
#endif

-#if (defined(FREEBSD) && __FreeBSD__ >= 7)
+#if (defined(FREEBSD) && __FreeBSD__ >= 7) && defined(__DragonFly__)
/* On the FreeBSD system, any target system at major version 7 shall */
/* have dl_iterate_phdr; therefore, we need not make it weak as above. */
# define HAVE_DL_ITERATE_PHDR
@@ -646,7 +652,8 @@
return(0);
}
if( cachedResult == 0 ) {
-# if defined(NETBSD) && defined(RTLD_DI_LINKMAP)
+/* HACK */
+# if defined(__DragonFly__) && 1
struct link_map *lm = NULL;
if (!dlinfo(RTLD_SELF, RTLD_DI_LINKMAP, &lm))
cachedResult = lm;

#8 Updated by c.turner1 about 2 years ago

Also: this was on gc-7.2d.tar.gz fetched from upstream today rather than 7.1 as
outlined in original report

#9 Updated by c.turner1 about 2 years ago

Ok - a smidge cleaner patch is attached which also passes tests for threading when threading
is enabled.

Digging around in the #ifdefs, it looks like the FreeBSD support is somewhat crufty and goes
back quite a ways (e.g. supports non-elf FreeBSD's as well as more modern ones >= v7), with the dragonfly support tacked on to FBSD - ideally at this stage, since our threads, ABI, and toolchain have diverged somewhat, I'd break out the dragonfly config separately, but my initial effort in this direction didn't quite work.. so for now, this was a 'hack it to compile and see what happens' approach, borrowing from netbsd threading ifdefs on the threading side (as there are some comments about freebsd threads not being supported), but netbsd threads had their own hacks, so it still might not be 100%.

I'll try to build lang/ecl with threads and see what happens when I run a test
which exercises threads and the gc.

if this has issues, then I'll probably try to clean up the non-threaded approach, and then add more 'discrete' (non-hooked-into-netbsd threading ifdefs) support for DF threads.

#10 Updated by c.turner1 about 2 years ago

well.. unfortunately threading doesn't appear to be a 'hack it and it works' kind
of an issue (as mostly expected).

ecl builds 99% of the way through to the link stage, but based on a truss, issues a GC during its link command, and due to some ifdeffery, there is some kind of threaded 'gc stop/start world'
signal mismatching going on. will need to dig deeper on this .. but not today.

this might be a different case for non-threaded operation - but my 'hack the files' level of understanding on this one is too low to say.

Also available in: Atom PDF