Bug #2353

panic: assertion "gd->gd_spinlocks_wr == 0" failed in bsd4_schedulerclock

Added by jaydg about 2 years ago. Updated over 1 year ago.

Status:In ProgressStart date:04/27/2012
Priority:NormalDue date:
Assignee:alexh% Done:

0%

Category:-
Target version:-

Description

Version String: DragonFly v3.1.0.322.gc231b-DEVELOPMENT #0: Fri Mar 16 20:31:33 CET 2012

#0 _get_mycpu () at ./machine/thread.h:69
#1 md_dumpsys (di=<optimized out>) at /usr/src/sys/platform/pc64/x86_64/dump_machdep.c:263
#2 0xffffffff802fb2d2 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:937
#3 0xffffffff802fb936 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xffffffff802fbbed in panic (fmt=0xffffffff80585cf0 "assertion \"%s\" failed in %s at %s:%u") at /usr/src/sys/kern/kern_shutdown.c:843
#5 0xffffffff803036fb in bsd4_schedulerclock (lp=<optimized out>, period=<optimized out>, cpstamp=<optimized out>) at /usr/src/sys/kern/usched_bsd4.c:622
#6 0xffffffff802db16c in schedclock (info=0xffffffff820106e8, in_ipi=<optimized out>, frame=<optimized out>) at /usr/src/sys/kern/kern_clock.c:776
#7 0xffffffff802f9e03 in systimer_intr (timep=<optimized out>, in_ipi=<optimized out>, frame=0x0) at /usr/src/sys/kern/kern_systimer.c:102
#8 0xffffffff8054ad04 in lapic_timer_process_oncpu (gd=0xffffffff8200f000, frame=0x0) at /usr/src/sys/platform/pc64/apic/lapic.c:336
#9 0xffffffff8054ad3c in lapic_timer_process () at /usr/src/sys/platform/pc64/apic/lapic.c:342
#10 0xffffffff80540d69 in splz_timer () at /usr/src/sys/platform/pc64/x86_64/ipl.s:461
#11 0x0000000000000000 in ?? ()

Before dumping, several text blocks like this one were emitted:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic->id = 00000000
fault virtual address = 0x20
fault code = supervisor write data, page not present
instruction pointer = 0x8:0xffffffff802ec7fc
stack pointer = 0x10:0xffffffe0a8b4a980
frame pointer = 0x10:0xffffffe0a8b4a9c8
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 0, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 8139
current thread = pri 6 (CRIT)
kernel: type 12 trap, code=2

A dump is available.


Related issues

Related to Bug #2336: 3.0.3 catchall Resolved 03/26/2012
Related to Bug #2402: Showstopper panics for Release 3.2 New 08/15/2012
Related to Bug #2432: ral, ifconfig wlan0 up freeze Closed 10/11/2012
Related to Bug #2453: panic: assertion "gd->gd_spinlocks == 0" failed New 11/12/2012

History

#1 Updated by vsrinivas about 2 years ago

If you could upload the dump to leaf or someplace accessible, that would be very useful!

Thanks,
-- vs;

#2 Updated by jaydg about 2 years ago

I've uploaded the dump on leaf, ~jaydg/crash/2353.

#3 Updated by vsrinivas about 2 years ago

Some preliminary debugging:

was in 'moused' thread, we were interrupted or somehow called splz. We were probably in a critical section; crit_exit can splz() itself.

splz
++critcount from splz itself (exp critcount=1)

splz_timer
lapic_timer_process
lapic_timer_process_oncpu
systimer_intr
++crit_enter from systimer_intr (exp.cc=2)
++gd_syst_next from systimer_intr

--crit_enter from systimer_intr (exp.cc=1)

schedclock [first systimer]

lp = 'moused thread'

bsd4_schedulerclock

on CPU0; below rrinterval, no need_user_resched
[exp.critcount=1, found cc=4]

(pollclock)
(emergency_intr_timer_callback)
(hardclock)
(statclock)

#4 Updated by vsrinivas about 2 years ago

Okay,

The problem is that we hold an MTX spinlock while attempting to go to sleep. The specific callpath and problem is that we hold the syscons MTX spinlock at :769, :771 of sys/dev/misc/syscons/syscons.c, around a device ioctl routine which may explicitly tsleep. The specific tsleep in question is via sysmouse_event, ultimately hitting kern_kevent and sleeping in kqueue.

First, why are we using MTX spinlocks at all?

Second, it is probably inappropriate to hold an MTX spinlock around the entire ioctl path here. The path gets the tty_token, among many other things, the chances of blocking are high.

What is it synchronizing that the tty_token is not?

#5 Updated by alexh over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to alexh

Also available in: Atom PDF