Bug #2506: panic: unmount: dangling vnode running poudriere - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #2506

closed

panic: unmount: dangling vnode running poudriere

Added by ftigeot over 12 years ago. Updated over 12 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

02/05/2013

Due date:

% Done:

100%

Estimated time:

Description

I was running 4 poudriere jobs on a Xeon-E3 (8 threads) / DragonFly 3.3 system and it panicked after 30 hours

Backtrace:

(kgdb) #0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:265
#2 0xffffffff804f5d12 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#3 0xffffffff804f6376 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:373
#4 0xffffffff804f662d in panic (fmt=0xffffffff80906deb "%s")
at /usr/src/sys/kern/kern_shutdown.c:819
#5 0xffffffff808b6408 in trap_fatal (frame=0xffffffe2cb503628,
eva=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/trap.c:1029
#6 0xffffffff808b6f11 in trap (frame=0xffffffe2cb503628)
at /usr/src/sys/platform/pc64/x86_64/trap.c:754
#7 0xffffffff808a0cdf in calltrap ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:188
#8 0xffffffff808b0eb9 in db_read_bytes (addr=7165064483209180471, size=8,
data=0xffffffe2cb503708 "")
at /usr/src/sys/platform/pc64/x86_64/db_interface.c:240
#9 0xffffffff802a735d in db_get_value (addr=7165064483209180471, size=8,
is_signed=0) at /usr/src/sys/ddb/db_access.c:58
#10 0xffffffff808b1b55 in db_nextframe (ip=<optimized out>,
fp=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/db_trace.c:234
#11 db_stack_trace_cmd (addr=<optimized out>, have_addr=<optimized out>,
count=<optimized out>, modif=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:440
#12 0xffffffff808b1d17 in print_backtrace (count=-883935480)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:452
#13 0xffffffff804f65f8 in panic (
fmt=0xffffffff8097713b "unmount: dangling vnode")
at /usr/src/sys/kern/kern_shutdown.c:812
#14 0xffffffff80577a57 in dounmount (mp=0xffffffe27a209700, flags=-883935000)
at /usr/src/sys/kern/vfs_syscalls.c:807
#15 0xffffffff80577c2c in sys_unmount (uap=0xffffffe2cb503a08)
at /usr/src/sys/kern/vfs_syscalls.c:622
#16 0xffffffff808b7683 in syscall2 (frame=0xffffffe2cb503ab8)
at /usr/src/sys/platform/pc64/x86_64/trap.c:1238
#17 0xffffffff808a0f2b in Xfast_syscall ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:323
#18 0x000000000000002b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Core dump and associated files available on leaf:~ftigeot/crash/crash.dangling_vnode

Files

patch-dounmount.txt (1.2 KB) patch-dounmount.txt

ftigeot, 02/07/2013 01:47 AM

Actions

Copy link

Updated by vsrinivas over 12 years ago

At a first look, it seemed like a procfs mount was being unmounted
and the test that there were no vnodes linked in to the mount
tripped. However in the core, the vnode list on the procfs mount was
empty.

I think there is a race between dounmount() and insmntque(, NULL)
removing vnodes from a mount. dounmount is synchronizing via the
per-mount lockmgr, while insmntque() expects the vnode list to be
protected by the mntvnode token. Could you try taking the
mntvnode_token throughout the span of the dounmount() function?

Thanks,
-- vs;

Actions

Copy link

Updated by ftigeot over 12 years ago

File patch-dounmount.txt patch-dounmount.txt added
% Done changed from 0 to 50

Thanks for having a look!

Would such a patch be acceptable ? I have not removed existing locking directives, which
may make things unnecessarily slow.

Actions

Copy link

Updated by vsrinivas over 12 years ago

On Thu, Feb 07, 2013 at 01:47:46AM -0800, Francois Tigeot via Redmine wrote:

Issue #2506 has been updated by ftigeot.

File patch-dounmount.txt added
% Done changed from 0 to 50

Thanks for having a look!

Would such a patch be acceptable ? I have not removed existing locking directives, which
may make things unnecessarily slow.

Looks fine; dounmount() isn't called in performance-critical paths
and insmntque (which is closer to hot paths) is already using the
global mntvnode token.

We may want to eventually move insmntque to per-mount locks and clear
up what exactly is covered by what locks in the mount structures.

-- vs;

Actions

Copy link

Updated by ftigeot over 12 years ago

Status changed from New to Resolved
% Done changed from 50 to 100

Fixed by commit effb40c8d1ffed9a461d4136a276566af7e29aaa

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #2506

panic: unmount: dangling vnode running poudriere

Updated by vsrinivas over 12 years ago

Updated by ftigeot over 12 years ago

Updated by vsrinivas over 12 years ago

Updated by ftigeot over 12 years ago