Bug #2597

panic building devel/gettext

Added by davshao 11 months ago. Updated 10 months ago.

Status:ClosedStart date:10/20/2013
Priority:HighDue date:
Assignee:dillon% Done:

0%

Category:-
Target version:-

Description

Using current master at least updated through
commit e8be1190de880a82da339c7b763cadb8bd48e960

Date: Fri Oct 18 10:59:57 2013 +0800

vga_pci: Fix cached resources cleanup and setup driver's softc size

build either dports or pkgsrc from scratch, then build say devel/gettext. On multiple machines there is a panic similar to:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; lapic.id = 02000000
fault virtual address = 0.18
fault code = supervisor read, page not present
instruction pointer = 0x8:0xa04c6b38
stack pointer = 0x10:0xba845d40
frame pointer = 0x10:0x1e
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12
<- SMP: XXX
kernel: type 12 trap, code=0

CPU2 stopping CPUs: 0x0000000b
stopped
Stopped at vn_syncer_add+0x1e: movl 0x18(%eax),%esi

The reason I mention m4 1.4.17 is that a recent upgrade appears to use __fpending which on DragonFly is defined contrary to how it is apparently defined in Linux, with the most serious potential problem that DragonFly's __fpending returns a signed value. The panics are not observed building perl or even building world and kernel, so it does not appear to be a general problem with the new syncer code.

dports m4 is patched to paper over the differing signatures of __fpending in order to even get m4 1.4.17 to build. I get the same panics when I applied the dports patch to pkgsrc to get its m4 1.4.17 to build.

It is possible m4 is blameless, but whatever the problem is, it seems fairly serious since it manifests in the build path of git which would enable further updates to src or dports if there is a fix ...

History

#1 Updated by davshao 11 months ago

  • Subject changed from panic building devel/gettext post update to m4 1.4.17 to panic building devel/gettext

Actually it appears that the panic building say devel/gettext may appear even before m4 1.4.17 is built.

#2 Updated by marino 10 months ago

Is there any new developments here?
I've never seen a similar panic. The m4 patch has been in place for a while without complaint as well.

Is anyone else seeing this panic?

#3 Updated by davshao 10 months ago

I have bisected the problem to NOT occur with

commit 00891516deedeeb5735ac0fb922e0a2823334f55
Date: Mon Oct 14 16:34:32 2013 -0700

kernel - Use per-cpu token for deadlwps list

and to occur with

commit d83c62446a4a82336e1942dd088225f3e1d78891
Date: Mon Oct 14 16:54:37 2013 -0700

kernel - Fix bug last commit (2)

I have also observed that the systems where the problem occurs appear to have ufs as the filesystem, whereas some other systems with hammer do not exhibit the same problem.

#4 Updated by davshao 10 months ago

  • Priority changed from Normal to High

I believe this problem needs to be addressed before the 3.6 release because it may affect many if not all systems using the ufs filesystem.

/etc/make.conf:

CFLAGS+=-g
STRIP=
WITH_PKGNG=yes
DISABLE_VULNERABILITIES=yes
WITH_VIM_OPTIONS=yes

/etc/rc.conf:

nfs_reserved_port_only="YES"
sshd_enable="YES"
usbd_enable="YES"
nfs_client_enable="YES"
rpc_umntall_enable="NO"
dumpdev="/dev/serno/whatever.s1b" # via installer configuration
ifconfig_re0="DHCP" # via installer configuration
devd_enable="YES"
udevd_enable="YES"
dntpd_enable="YES"
rpcbind="YES"

/boot/loader.conf:

vm.dma_reserved=67108864

Source updated through:

commit 053e752c0cc4596f5a4cb2ec888143a29c7ecd9b
Date: Fri Nov 1 09:48:10 2013 -0700

hammer2 - stabilization

After full make -j7 buildworld buildkernel and install, using latest dports, and moving /usr/local and /var/db/pkg so that dports installation is from scratch:

# cd /usr/dports/ports-mgmt/pkg
# make clean && make BATCH=yes install clean && rehash
# cd /usr/dports/lang/perl5.18
# make clean && make BATCH=yes install clean && rehash
# cd /usr/dports/devel/gettext
# make clean && make BATCH=yes install clean && rehash

panics with, and often requires a fsck after reboot, with:

checking for strpbrk... (cached) yes
checking whether strstr works in linear time ...

Fatal trap 12: page fault while in kernel mode
cpuid = 2; lapic->id = 02000000
fault virtual address = 0x30
fault code = supervisor read data, page not present
instruction pointer = 0x8:0xffffffff805df496
stack pointer = 0x10:0xffffffe26d7f3a30
frame pointer = 0x10:0xffffffe26d7f3a50
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
kernel: type 12 trap, code=0

CPU2 stopping CPUS: 0x0000000b
stopped
Stopped at vn_syncer_add+0x22: movq 0x30(%rax),%r12
db> trace
vn_syncer_add() at vn_syncer_add+0x22 0xffffffff80fdf496
syncer_thread() at syncer_thread+0xd7 0xffffffff805df758

#5 Updated by marino 10 months ago

I can confirm these panics.
I just got two of them, of of 2 attempts, on an oct 18 kernel/world (i386, UFS)

Now that I think about it, I had them before but I didn't make the connection until now.

#6 Updated by dillon 10 months ago

  • Status changed from New to Closed
  • Assignee set to dillon

Should now be fixed in 518cbb49745841d703ea07e413dcfe9b7cb0ce96.

-Matt

Also available in: Atom PDF