Bug #2506
panic: unmount: dangling vnode running poudriere
| Status: | Resolved | Start date: | 02/05/2013 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 100% |
|
| Category: | - | |||
| Target version: | - |
Description
I was running 4 poudriere jobs on a Xeon-E3 (8 threads) / DragonFly 3.3 system and it panicked after 30 hours
Backtrace:
(kgdb) #0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:265
#2 0xffffffff804f5d12 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#3 0xffffffff804f6376 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:373
#4 0xffffffff804f662d in panic (fmt=0xffffffff80906deb "%s")
at /usr/src/sys/kern/kern_shutdown.c:819
#5 0xffffffff808b6408 in trap_fatal (frame=0xffffffe2cb503628,
eva=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/trap.c:1029
#6 0xffffffff808b6f11 in trap (frame=0xffffffe2cb503628)
at /usr/src/sys/platform/pc64/x86_64/trap.c:754
#7 0xffffffff808a0cdf in calltrap ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:188
#8 0xffffffff808b0eb9 in db_read_bytes (addr=7165064483209180471, size=8,
data=0xffffffe2cb503708 "")
at /usr/src/sys/platform/pc64/x86_64/db_interface.c:240
#9 0xffffffff802a735d in db_get_value (addr=7165064483209180471, size=8,
is_signed=0) at /usr/src/sys/ddb/db_access.c:58
#10 0xffffffff808b1b55 in db_nextframe (ip=<optimized out>,
fp=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/db_trace.c:234
#11 db_stack_trace_cmd (addr=<optimized out>, have_addr=<optimized out>,
count=<optimized out>, modif=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:440
#12 0xffffffff808b1d17 in print_backtrace (count=-883935480)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:452
#13 0xffffffff804f65f8 in panic (
fmt=0xffffffff8097713b "unmount: dangling vnode")
at /usr/src/sys/kern/kern_shutdown.c:812
#14 0xffffffff80577a57 in dounmount (mp=0xffffffe27a209700, flags=-883935000)
at /usr/src/sys/kern/vfs_syscalls.c:807
#15 0xffffffff80577c2c in sys_unmount (uap=0xffffffe2cb503a08)
at /usr/src/sys/kern/vfs_syscalls.c:622
#16 0xffffffff808b7683 in syscall2 (frame=0xffffffe2cb503ab8)
at /usr/src/sys/platform/pc64/x86_64/trap.c:1238
#17 0xffffffff808a0f2b in Xfast_syscall ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:323
#18 0x000000000000002b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Core dump and associated files available on leaf:~ftigeot/crash/crash.dangling_vnode
Related todos
History
Updated by vsrinivas 3 months ago
At a first look, it seemed like a procfs mount was being unmounted
and the test that there were no vnodes linked in to the mount
tripped. However in the core, the vnode list on the procfs mount was
empty.
I think there is a race between dounmount() and insmntque(, NULL)
removing vnodes from a mount. dounmount is synchronizing via the
per-mount lockmgr, while insmntque() expects the vnode list to be
protected by the mntvnode token. Could you try taking the
mntvnode_token throughout the span of the dounmount() function?
Thanks,
-- vs;
Updated by ftigeot 3 months ago
- File patch-dounmount.txt added
- % Done changed from 0 to 50
Thanks for having a look!
Would such a patch be acceptable ? I have not removed existing locking directives, which
may make things unnecessarily slow.
Updated by vsrinivas 3 months ago
On Thu, Feb 07, 2013 at 01:47:46AM -0800, Francois Tigeot via Redmine wrote:
>
>Issue #2506 has been updated by ftigeot.
>
>File patch-dounmount.txt added
>% Done changed from 0 to 50
>
>Thanks for having a look!
>
>Would such a patch be acceptable ? I have not removed existing locking directives, which
>may make things unnecessarily slow.
Looks fine; dounmount() isn't called in performance-critical paths
and insmntque (which is closer to hot paths) is already using the
global mntvnode token.
We may want to eventually move insmntque to per-mount locks and clear
up what exactly is covered by what locks in the mount structures.
-- vs;