Bug #2506

panic: unmount: dangling vnode running poudriere

Added by ftigeot almost 2 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:02/05/2013
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:-
Target version:-

Description

I was running 4 poudriere jobs on a Xeon-E3 (8 threads) / DragonFly 3.3 system and it panicked after 30 hours

Backtrace:

(kgdb) #0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:265
#2 0xffffffff804f5d12 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:913
#3 0xffffffff804f6376 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:373
#4 0xffffffff804f662d in panic (fmt=0xffffffff80906deb "%s")
at /usr/src/sys/kern/kern_shutdown.c:819
#5 0xffffffff808b6408 in trap_fatal (frame=0xffffffe2cb503628,
eva=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/trap.c:1029
#6 0xffffffff808b6f11 in trap (frame=0xffffffe2cb503628)
at /usr/src/sys/platform/pc64/x86_64/trap.c:754
#7 0xffffffff808a0cdf in calltrap ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:188
#8 0xffffffff808b0eb9 in db_read_bytes (addr=7165064483209180471, size=8,
data=0xffffffe2cb503708 "")
at /usr/src/sys/platform/pc64/x86_64/db_interface.c:240
#9 0xffffffff802a735d in db_get_value (addr=7165064483209180471, size=8,
is_signed=0) at /usr/src/sys/ddb/db_access.c:58
#10 0xffffffff808b1b55 in db_nextframe (ip=<optimized out>,
fp=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/db_trace.c:234
#11 db_stack_trace_cmd (addr=<optimized out>, have_addr=<optimized out>,
count=<optimized out>, modif=<optimized out>)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:440
#12 0xffffffff808b1d17 in print_backtrace (count=-883935480)
at /usr/src/sys/platform/pc64/x86_64/db_trace.c:452
#13 0xffffffff804f65f8 in panic (
fmt=0xffffffff8097713b "unmount: dangling vnode")
at /usr/src/sys/kern/kern_shutdown.c:812
#14 0xffffffff80577a57 in dounmount (mp=0xffffffe27a209700, flags=-883935000)
at /usr/src/sys/kern/vfs_syscalls.c:807
#15 0xffffffff80577c2c in sys_unmount (uap=0xffffffe2cb503a08)
at /usr/src/sys/kern/vfs_syscalls.c:622
#16 0xffffffff808b7683 in syscall2 (frame=0xffffffe2cb503ab8)
at /usr/src/sys/platform/pc64/x86_64/trap.c:1238
#17 0xffffffff808a0f2b in Xfast_syscall ()
at /usr/src/sys/platform/pc64/x86_64/exception.S:323
#18 0x000000000000002b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Core dump and associated files available on leaf:~ftigeot/crash/crash.dangling_vnode

patch-dounmount.txt Magnifier (1.2 KB) ftigeot, 02/07/2013 01:47 AM

History

#1 Updated by vsrinivas almost 2 years ago

At a first look, it seemed like a procfs mount was being unmounted
and the test that there were no vnodes linked in to the mount
tripped. However in the core, the vnode list on the procfs mount was
empty.

I think there is a race between dounmount() and insmntque(, NULL)
removing vnodes from a mount. dounmount is synchronizing via the
per-mount lockmgr, while insmntque() expects the vnode list to be
protected by the mntvnode token. Could you try taking the
mntvnode_token throughout the span of the dounmount() function?

Thanks,
-- vs;

#2 Updated by ftigeot almost 2 years ago

Thanks for having a look!

Would such a patch be acceptable ? I have not removed existing locking directives, which
may make things unnecessarily slow.

#3 Updated by vsrinivas almost 2 years ago

On Thu, Feb 07, 2013 at 01:47:46AM -0800, Francois Tigeot via Redmine wrote:
>
>Issue #2506 has been updated by ftigeot.
>
>File patch-dounmount.txt added
>% Done changed from 0 to 50
>
>Thanks for having a look!
>
>Would such a patch be acceptable ? I have not removed existing locking directives, which
>may make things unnecessarily slow.

Looks fine; dounmount() isn't called in performance-critical paths
and insmntque (which is closer to hot paths) is already using the
global mntvnode token.

We may want to eventually move insmntque to per-mount locks and clear
up what exactly is covered by what locks in the mount structures.

-- vs;

#4 Updated by ftigeot almost 2 years ago

  • Status changed from New to Resolved
  • % Done changed from 50 to 100

Fixed by commit effb40c8d1ffed9a461d4136a276566af7e29aaa

Also available in: Atom PDF