Project

General

Profile

Actions

Bug #3221

closed

Assertions hit in the mtmagazine.

Added by yellowrabbit2010 almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Other
Target version:
-
Start date:
01/31/2020
Due date:
% Done:

0%

Estimated time:

Description

Hello,
Sorry that I can’t figure out the exact sequence when this happens, the process looks random. And I can work in the same gimp for half a day and not encounter this error, and sometimes I can’t even start it.
I can compile the package, in this case textproc/groff: ================
/bin/mkdir p ./tmac
sed -e "s;[]PNMTOPS_NOSETPAGE[];pnmtops;g" \
./tmac/www.tmac.in > ./tmac/www.tmac
/bin/mkdir -p `dirname doc/webpage.ps` \
&& LANG=C LC_ALL=C sed -e "s;[]VERSION[];1.22.4;" doc/webpage.ms | GROFF_COMMAND_PREFIX= GROFF_BIN_PATH="/usr/obj/dports/textproc/groff/groff-1.22.4" /usr/obj/dports/textproc/groff/groff-1.22.4/groff -I/usr/obj/dports/textproc/groff/groff-1.22.4 -M /usr/obj/dports/textproc/groff/groff-1.22.4/doc -M/usr/obj/dports/textproc/groff/groff-1.22.4/tmac -M/usr/obj/dports/textproc/groff/groff-1.22.4/tmac -F/usr/obj/dports/textproc/groff/groff-1.22.4/font -F/usr/obj/dports/textproc/groff/groff-1.22.4/font -Upet -ww -Tps -ms -mwww >doc/webpage.ps
assertion: mp
>rounds != 0 in mtmagazine_alloc
gmake2: * [Makefile:11742: doc/webpage.ps] Error 1
gmake2: Leaving directory '/usr/obj/dports/textproc/groff/groff-1.22.4'
gmake1: *
[Makefile:5600: all] Error 2
gmake1: Leaving directory '/usr/obj/dports/textproc/groff/groff-1.22.4'
  • Error code 1

Stop.
make: stopped in /usr/dports/textproc/groff
fly# ================

or i can just try to run gimp ================
rabbit@fly /usr/src% gimp
assertion: MAGAZINE_NOTFULL(mp) in mtmagazine_free
gimp: terminated: Terminated ================

I'm at the master branch, commit d0e99d5ddb6a001850be2dff41813e1cad00b5af.


Files

IMG_20200203_090633_HDR-min.jpg (1.06 MB) IMG_20200203_090633_HDR-min.jpg memtest screen yellowrabbit2010, 02/03/2020 02:36 PM
IMG_20200204_191847_HDR.jpg (3.79 MB) IMG_20200204_191847_HDR.jpg Acer c720 gdb screenshot yellowrabbit2010, 02/04/2020 01:45 AM
Actions #1

Updated by yellowrabbit2010 almost 5 years ago

v5.7.0-1344-g3af807019f

Since it seems like some of the crashes are related to tp->mags0.prev, I tried to set a watchpoint for this field. I don’t know how well it turned out. If I did something wrong, then correct me.
I built libc and dntpd (yeah, this is another program that breaks, I just did not notice it because it starts) with debug information.

======================================================
Reading symbols from /usr/sbin/dntpd...done.
(gdb) startx
Undefined command: "startx". Try "help".
(gdb) start
Temporary breakpoint 1 at 0x401b10: file /usr/src/usr.sbin/dntpd/main.c, line 65.
Starting program: /usr/sbin/dntpd

Temporary breakpoint 1, main (ac=1, av=0x7fffffdfd660)
at /usr/src/usr.sbin/dntpd/main.c:65
65 int test_opt = 0;
(gdb) break mtmagazine_alloc
Breakpoint 2 at 0x800539f0b: file /usr/src/lib/libc/../libc/stdlib/nmalloc.c, line 1682.
(gdb) cont
Continuing.

Breakpoint 2, mtmagazine_alloc (zi=4, flags=0)
at /usr/src/lib/libc/../libc/stdlib/nmalloc.c:1682
1682 tp = &thread_mags;
(gdb) n
1683 if (tp->init < 0)
(gdb) p tp->mags
$1 = {{loaded = 0x0, prev = 0x0} <repeats 72 times>}
(gdb) p tp->mags0
$2 = {loaded = 0x0, prev = 0x0}
(gdb) p tp->mags0.prev
$3 = (struct magazine ) 0x0
(gdb) p &tp->mags0.prev
$4 = (struct magazine *
) 0x80067c508
(gdb) watch *0x80067c508
Hardware watchpoint 3: *0x80067c508
(gdb) cont
Continuing.

Breakpoint 2, mtmagazine_alloc (zi=55, flags=5)
at /usr/src/lib/libc/../libc/stdlib/nmalloc.c:1682
1682 tp = &thread_mags;
(gdb)
Continuing.
Hardware watchpoint 3: *0x80067c508

Old value = 0
New value = -2109017
__strtok_r (s=0x7fffffdfd1a7 "0.dragonfly.pool.ntp.org\n",
delim=0x40613f " \t\r\n", last=0x80067c508)
at /usr/src/lib/libc/../libc/string/strtok.c:93
93 return (tok);
(gdb) ======================================

Actions #2

Updated by yellowrabbit2010 almost 5 years ago

Continuation of the debugging session (another computer). In short: variable last (lib/libc/string/strtok.c:111) defined as
---
static __thread char *last TLS_ATTRIBUTE;
---
has same runtime address (0x80067c508) as thread_mags.mags0.prev defined as
--- lib/libc/stdlib/nmalloc.c:327
static __thread thr_mags thread_mags TLS_ATTRIBUTE;
---

The last thing I have to try is to make a flash drive with dfly-x86_64-5.6.2 completely from scratch, update it to the master and check that the dntpd does not die silently.

====================================================================
(gdb) bt
#0 _strtok_r (s=0x7fffffdfd617 "0.dragonfly.pool.ntp.org\n",
delim=0x406124 " \t\r\n", last=0x80067c508)
at /usr/src/lib/libc/../libc/string/strtok.c:93
#1 0x00000008004f34cd in strtok (s=0x7fffffdfd610 "server",
delim=0x406124 " \t\r\n") at /usr/src/lib/libc/../libc/string/strtok.c:113
#2 0x0000000000402627 in process_config_file (
path=0x405af8 "/etc/dntpd.conf") at main.c:417
#3 0x0000000000401f48 in main (ac=1, av=0x7fffffdfdad8) at main.c:198
(gdb)
(gdb) list
88 if (c == 0)
89 s = NULL;
90 else
91 s[-1] = '\0';
92 last = s;
93 return (tok);
94 }
95 } while (sc != 0);
96 }
97 /
NOTREACHED /
(gdb) p last
$5 = (char *
restrict) 0x80067c508
(gdb) cont
Continuing.
Hardware watchpoint 3: *0x80067c508
Old value = 2107881
New value = -2107856
_strtok_r (s=0x7fffffdfd630 "", delim=0x406124 " \t\r\n", last=0x80067c508)
at /usr/src/lib/libc/../libc/string/strtok.c:93
93 return (tok);
(gdb) cont
Continuing.
Breakpoint 2, mtmagazine_alloc (zi=15, flags=0)
at /usr/src/lib/libc/../libc/stdlib/nmalloc.c:1683
1683 if (tp>init < 0)
(gdb) p tp
$6 = (thr_mags *) 0x80067c500
(gdb) p tp->mags
$7 = {{loaded = 0x0, prev = 0x7fffffdfd630}, {loaded = 0x0, prev = 0x0}, {
loaded = 0x0, prev = 0x0}, {loaded = 0x0, prev = 0x0}, {
loaded = 0x8006b1000, prev = 0x0}, {loaded = 0x0,
prev = 0x0} <repeats 67 times>}
(gdb) ==================================================================

Actions #3

Updated by dillon almost 5 years ago

Hmm. If the process is random then it could be a hardware issue / memory corruption of some sort. Do you get other errors or is it always a mymagazine error ?

-Matt

Actions #4

Updated by yellowrabbit2010 almost 5 years ago

Well, the memory chips do not show errors and the temperature is not so high (~ 50C).
Since I now found an always recurring case with the system program dntpd, then I just need the installation image + upgrade to the master to check it on another computer. Unfortunately compilation with flash will take time. I will report the result later.

Actions #5

Updated by yellowrabbit2010 almost 5 years ago

Ok, report.
Acer c720 notebook, USB flash with dd'ed dfly-x86_64-5.6.2_REL.img. Make aux directory /mnt/usr-obj.
Add new partition (number 3), size = 6G for /usr/src and /usr/obj.
/etc/fstab
/dev/part-by-label/DragonFly_v5.6.2.a / ufs rw,noatime 0 1
/dev/serno/1590261180302284762.s3a /mnt/src-obj ufs rw,noatime 1 1
/mnt/src-obj/src /usr/src null rw 0 0
/mnt/src-obj/obj /usr/obj null rw 0 0
============
I don’t want to make a lot of changes to the reference image, so I don’t configure the network and the like, I just pick up the latest sources in .s3a/src on a big machine:
in mounted .s3a/src
git clone git://git.dragonflybsd.org/dragonfly.git .
==

Boot C720 from USB

root

cd /usr/src
make buildworld
make buildkernel KERNCONF=X86_64_GENERIC
make installkernel KERNCONF=X86_64_GENERIC
make installworld
make upgrade
reboot
==

add debug info to dntpd and libc

cd /usr/src/usr.sbin
make DEBUG_FLAGS=-g all
make DEBUG_FLAGS=-g install
cd /usr/src/lib/libc
env CFLAGS=-O0 make DEBUG_FLAGS=-g all
env CFLAGS=-O0 make DEBUG_FLAGS=-g install
=
  1. gdb q `which dntpd`
    Reading symbols from /usr/sbin/dntpd...done.
    (gdb) start
    Temporary breakpoint 1 at 0x401c13: file main.c, line 64.
    Starting program: /usr/sbin/dntpd
    Temporary breakpoint 1, main (ac=1, av=0x7fffffdfd910) at main.c:64
    64 {
    (gdb) break nmalloc.c:1683
    Breakpoint 2 at 0x800510eb2: file /usr/src/lib/libc/../libc/stdlib/nmalloc.c, li
    ne 1683.
    (gdb) cont
    Continuing.
    Breakpoint 2, mtmagazine_alloc (zi=4, flags=0)
    at /usr/src/lib/libc/../libc/stdlib/nmalloc.c:1683
    1683 if (tp
    >init < 0)
    (gdb) p &(tp->mags0.prev)
    $1 = (struct magazine *) 0x800653508
    (gdb) del break 2
    (gdb) break __strtok_r
    Breakpoint 3 at 0x8004ca303: file /usr/src/lib/libc/../libc/string/strtok.c, lin
    e 60.
    (gdb) cont
    Continuing.
    Breakpoint 3, __strtok_r (s=0x7fffffdfd460 "", delim=0x4046d8 " \t\r\n",
    last=0x800653508) at /usr/src/lib/libc/../libc/string/strtok.c:60
    60 if (s NULL && (s = *last) NULL)
    (gdb) p last
    $2 = (char *
    restrict) 0x800653508
    (gdb)
    =
debug

See last 0x800653508 &(tp->mags0.prev) ?
I come to the conclusion that I'm looking somewhere wrong. After all, it cannot be that from a clean installation image with an absolute minimum of changes on two computers of different classes an address overlap is obtained only for me.
I clearly make a mistake somewhere since this is not repeated by anyone else:(

Actions #6

Updated by yellowrabbit2010 almost 5 years ago

  • Status changed from New to Resolved

My bad, speaking of a ``clean'' system, I did not take into account the changes in the conf files, but only the absence of any additional packages.
I managed to bring my system to normal.

TL;DR Some contents of the /etc/make.conf file may lead to ``incorrect'' contents of the /lib/libc.so.8 library.

  • I will use the terms correct and incorrect in the sense of whether overlap of the ``last'' and ``thread_mags'' variables occurs in my system, and not in the sense of the libc.so.8 library being correct as a shared object file.

Branch master, commit 5be1fc4638827e5ad8f6f4a3a85c4466f559d760

Here's what the correct relocations in the library look like: =========
fly# readelf -r /usr/obj/usr/src/world_x86_64/usr/src/lib/libc/libc.so.8 | fgrep mags
0000001376a0 000500000012 R_X86_64_TPOFF64 0000000000000020 thread_mags + 0

fly# readelf -r /usr/obj/usr/src/world_x86_64/usr/src/lib/libc/libc.so.8 | fgrep last
000000137520 000100000012 R_X86_64_TPOFF64 0000000000000000 last.2332 + 0
000000138418 080200000007 R_X86_64_JUMP_SLO 0000000000054750 updlastlogx@@DF404.0 + 0

fly# objdump -d /usr/obj/usr/src/world_x86_64/lib/libc.so.8 | grep -A 7 '<strtok[^t]*>:'
00000000000384d2 <strtok>:
384d2: 48 83 ec 08 sub $0x8,%rsp
384d6: 48 8b 15 43 f0 0f 00 mov 0xff043(%rip),%rdx # 137520 <_DYNAMIC+0x238>
384dd: 64 48 03 14 25 00 00 add %fs:0x0,%rdx
384e4: 00 00
384e6: e8 d5 b3 ff ff callq 338c0 <__strtok_r@plt>
384eb: 48 83 c4 08 add $0x8,%rsp
384ef: c3 retq =========
As you can see, the ``last'' variable corresponds to one relocation record, it has the correct type (R_X86_64_TPOFF64) and the offset (137520) contains exactly the command that needs to be adjusted at load time (inside the strtok function, ``mov 0xff043(%rip),%rdx'').

Incorrect relocations: =========
rabbit@fly ~% readelf -r /usr/obj/usr/src/world_x86_64/lib/libc.so.8|fgrep mags
0000001979e8 000500000012 R_X86_64_TPOFF64 0000000000000020 thread_mags + 0

rabbit@fly ~% readelf -r /usr/obj/usr/src/world_x86_64/lib/libc.so.8|fgrep last
000000197a70 000100000010 R_X86_64_DTPMOD64 0000000000000000 last.2332 + 0
000000197d00 000100000010 R_X86_64_DTPMOD64 0000000000000000 last.2332 + 0
000000198770 080300000007 R_X86_64_JUMP_SLO 0000000000063022 updlastlogx@@DF404.0 + 0

rabbit@fly ~% objdump -d /usr/obj/usr/src/world_x86_64/lib/libc.so.8|grep -A 16 '<strtok[^t]*>:'
0000000000039497 <strtok>:
39497: 55 push %rbp
39498: 48 89 e5 mov %rsp,%rbp
3949b: 48 83 ec 10 sub $0x10,%rsp
3949f: 48 89 7d f8 mov %rdi,-0x8(%rbp)
394a3: 48 89 75 f0 mov %rsi,-0x10(%rbp)
394a7: 64 48 8b 14 25 00 00 mov %fs:0x0,%rdx
394ae: 00 00
394b0: 48 8b 05 e9 e3 15 00 mov 0x15e3e9(%rip),%rax # 1978a0 <_DYNAMIC+0x238>
394b7: 48 01 c2 add %rax,%rdx
394ba: 48 8b 4d f0 mov -0x10(%rbp),%rcx
394be: 48 8b 45 f8 mov -0x8(%rbp),%rax
394c2: 48 89 ce mov %rcx,%rsi
394c5: 48 89 c7 mov %rax,%rdi
394c8: e8 13 a8 ff ff callq 33ce0 <__strtok_r@plt>
394cd: c9 leaveq
394ce: c3 retq =========
Here we have for some reason two relocation entries that are of the wrong type (R_X86_64_DTPMOD64) and point (197a70) to a function ``__lpmap_map'' that has nothing to do with the ``last'' variable from the strtok function. =========
0000000000096e65 <__lpmap_map>:
96e65: 55 push %rbp
96e66: 48 89 e5 mov %rsp,%rbp
96e69: 48 83 ec 30 sub $0x30,%rsp
96e6d: 48 89 7d e8 mov %rdi,-0x18(%rbp)
96e71: 48 89 75 e0 mov %rsi,-0x20(%rbp)
96e75: 89 d0 mov %edx,%eax
96e77: 66 89 45 dc mov %ax,-0x24(%rbp)
96e7b: 66 48 8d 3d ed 0b 10 data16 lea 0x100bed(%rip),%rdi # 197a70 <_DYNAMIC+0x408>
96e82: 00
96e83: 66 66 48 e8 65 cd f9 data16 data16 callq 33bf0 <__tls_get_addr@plt>
96e8a: ff
96e8b: 8b 00 mov (%rax),%eax
96e8d: 85 c0 test %eax,%eax
96e8f: 0f 8f 08 01 00 00 jg 96f9d <__lpmap_map+0x138>
96e95: 66 48 8d 3d d3 0b 10 data16 lea 0x100bd3(%rip),%rdi # 197a70 <_DYNAMIC+0x408>
96e9c: 00
96e9d: 66 66 48 e8 4b cd f9 data16 data16 callq 33bf0 <__tls_get_addr@pl
... =========

Now a small table that shows the results of make build_all, depending on the /etc/make.conf file.
------------------------------------+ | /etc/make.conf | /lib/libc.so.8 |
------------------------------------+ | is absent | correct |
------------------------------------+ | is empty | correct |
------------------------------------+ | CFLAGS=-Werror | incorrect |
------------------------------------+

Actions

Also available in: Atom PDF