Project

General

Profile

Bug #2934

kernel (4.7) panic at buildworld: assertion "gd->gd_spinlocks == 0" failed

Added by peeter 8 months ago. Updated 8 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Kernel
Target version:
-
Start date:
08/06/2016
Due date:
% Done:

80%


Description

Kernel (DragonFly v4.7.0.109.g609bcd-DEVELOPMENT) panics at "make buildworld".

/var/crash/info.1:
---
Panic String: assertion "gd->gd_spinlocks == 0" failed in dfly_schedulerclock at /usr/src/sys/kern/usched_dfly.c:757
---

kgdb output is attached.

kgdb.txt.1 - kgdb output (28.7 KB) peeter, 08/06/2016 01:46 PM

History

#1 Updated by dillon 8 months ago

  • Assignee set to dillon

Hmm. It looks like splz() was called while a spinlock was held, but the backtrace doesn't go far enough to see who called splz(). From kgdb you can manually spill the stack and write down any procedural symbols that you get.

frame 7
info regi
(extract contents of %rsp)
x/a 0x<value_of_rsp>

And then hit return a lot to iterate the x/a. Write down any procedural symbols that you come across. The goal here is to try to figure out what code path called splz().

-Matt

#2 Updated by peeter 8 months ago

x/a gives this (just hitting enter, i.e. not specifying an address after the first one). Does this help? I can go further if needed:

---
(kgdb) bt
#0 _get_mycpu () at ./machine/thread.h:69
#1 panic (fmt=fmt@entry=0xffffffff80b368e0 "assertion \"%s\" failed in %s at %s:%u") at /usr/src/sys/kern/kern_shutdown.c:833
#2 0xffffffff806145d2 in dfly_schedulerclock (lp=<optimized out>, period=<optimized out>, cpstamp=<optimized out>)
at /usr/src/sys/kern/usched_dfly.c:757
#3 0xffffffff8060e5ef in usched_schedulerclock (lp=lp@entry=0x0, periodic=2130068, time=2891606236)
at /usr/src/sys/kern/kern_usched.c:160
#4 0xffffffff805e0a8f in schedclock (info=0xffffffff81615630 <CPU_prvspace_bsp+9776>, in_ipi=<optimized out>,
frame=<optimized out>) at /usr/src/sys/kern/kern_clock.c:914
#5 0xffffffff806050d2 in systimer_intr (timep=timep@entry=0xffffffe328baf5ec, in_ipi=in_ipi@entry=0, frame=frame@entry=0x0)
at /usr/src/sys/kern/kern_systimer.c:100
#6 0xffffffff8060573a in pcpu_timer_process_oncpu (gd=0xffffffff81613000 <CPU_prvspace_bsp>, frame=frame@entry=0x0)
at /usr/src/sys/kern/kern_cputimer.c:599
#7 0xffffffff80605e6e in pcpu_timer_process () at /usr/src/sys/kern/kern_cputimer.c:605
#8 0xffffffff809f5ce4 in splz_timer () at /usr/src/sys/platform/pc64/x86_64/ipl.s:453
#9 0x0000000000000000 in ?? ()
(kgdb) frame 7
#7 0xffffffff80605e6e in pcpu_timer_process () at /usr/src/sys/kern/kern_cputimer.c:605
605 pcpu_timer_process_oncpu(mycpu, NULL);
(kgdb) info regi
rax *value not available*
rbx 0xffffffe1e105ef80 -129368723584
rcx *value not available*
rdx *value not available*
rsi *value not available*
rdi *value not available*
rbp 0xffffffe328baf610 0xffffffe328baf610
rsp 0xffffffe328baf610 0xffffffe328baf610
r8 *value not available*
r9 *value not available*
r10 *value not available*
r11 *value not available*
r12 0xffffffe00ec63558 -137191082664
r13 0xffffffff8085f3d1 -2138704943
r14 0x0 0
r15 0x0 0
rip 0xffffffff80605e6e 0xffffffff80605e6e <pcpu_timer_process+23>
eflags *value not available*
cs *value not available*
ss *value not available*
ds *value not available*
es *value not available*
fs *value not available*
gs *value not available*

(kgdb) x/a 0xffffffe328baf610
0xffffffe328baf610: 0xffffffe328baf640
(kgdb)
0xffffffe328baf618: 0xffffffff809f5ce4 <splz_timer+19>
(kgdb)
0xffffffe328baf620: 0x0
(kgdb)
0xffffffe328baf628: 0xffffffe328baf6e0
(kgdb)
0xffffffe328baf630: 0x297
(kgdb)
0xffffffe328baf638: 0xffffffff80618ef9 <lwkt_user_yield+37>
(kgdb)
0xffffffe328baf640: 0xffffffe328baf660
(kgdb)
0xffffffe328baf648: 0xffffffff8085acb4 <vm_object_page_remove_callback+31>
(kgdb)
0xffffffe328baf650: 0xffffffe00ec63558
(kgdb)
0xffffffe328baf658: 0xffffffe328baf6e0
(kgdb)
0xffffffe328baf660: 0xffffffe328baf6d0
(kgdb)
0xffffffe328baf668: 0xffffffff8085ec5e <vm_page_rb_tree_RB_SCAN+252>
(kgdb)
0xffffffe328baf670: 0xb0
(kgdb)
0xffffffe328baf678: 0xffffffff814bcdd8 <kernel_object+56>
(kgdb)
0xffffffe328baf680: 0xffffffff814bcdc8 <kernel_object+40>
(kgdb)
0xffffffe328baf688: 0xffffffff8085ac95 <vm_object_page_remove_callback>
(kgdb)
0xffffffe328baf690: 0x0
(kgdb)
0xffffffe328baf698: 0xffffffe00ebc6dc0
(kgdb)
0xffffffe328baf6a0: 0xfffffffffe4144e0
(kgdb)
0xffffffe328baf6a8: 0xffffffff814bcda0 <kernel_object>
(kgdb)
0xffffffe328baf6b0: 0xffffffff814bcdc8 <kernel_object+40>
(kgdb)
0xffffffe328baf6b8: 0xffffffff814bce08 <kernel_object+104>
(kgdb)
0xffffffe328baf6c0: 0x0
(kgdb)
0xffffffe328baf6c8: 0x0
(kgdb)
0xffffffe328baf6d0: 0xffffffe328baf770
(kgdb)
0xffffffe328baf6d8: 0xffffffff8085e014 <vm_object_page_remove+217>
(kgdb)
0xffffffe328baf6e0: 0xfffffffffe4144e0
(kgdb)
0xffffffe328baf6e8: 0xfffffffffe4144ff
(kgdb)
0xffffffe328baf6f0: 0xffffffe300000000
(kgdb)
0xffffffe328baf6f8: 0xffffffff00000000
(kgdb)
0xffffffe328baf700: 0xffffffe328baf740
(kgdb)
0xffffffe328baf708: 0xffffffff808601c2 <vm_page_wakeup+96>
(kgdb)
0xffffffe328baf710: 0xffffffe00ec63558
(kgdb)
0xffffffe328baf718: 0xffffffff814bcda0 <kernel_object>
(kgdb)
0xffffffe328baf720: 0xffffffe328baf770
(kgdb)
0xffffffe328baf728: 0xffffffff80851595 <vm_fault_unwire+267>
(kgdb)
0xffffffe328baf730: 0xffffffff813951f0 <kernel_map+240>
(kgdb)
0xffffffe328baf738: 0xffffffff814bcda0 <kernel_object>
(kgdb)
0xffffffe328baf740: 0xffffffff81395100 <kernel_map>
(kgdb)
0xffffffe328baf748: 0xffffffff814bcda0 <kernel_object>
(kgdb)
0xffffffe328baf750: 0xffffffff81395100 <kernel_map>
(kgdb)
0xffffffe328baf758: 0xffffffe414500000
(kgdb)
0xffffffe328baf760: 0xffffffff8140f548 <map_entry_init+258024>
(kgdb)
0xffffffe328baf768: 0xffffffff8141f6d8 <map_entry_init+323960>
(kgdb)
0xffffffe328baf770: 0xffffffe328baf810
(kgdb)
0xffffffe328baf778: 0xffffffff808566a7 <vm_map_delete+517>
(kgdb)
0xffffffe328baf780: 0xffffffff813951f0 <kernel_map+240>
(kgdb)
0xffffffe328baf788: 0xffffffff81395168 <kernel_map+104>
(kgdb)
0xffffffe328baf790: 0xfffffffffe414500
(kgdb)
0xffffffe328baf798: 0xffffffff81395100 <kernel_map>
(kgdb)
0xffffffe328baf7a0: 0xffffffff81395188 <kernel_map+136>
(kgdb)
0xffffffe328baf7a8: 0xffffffe328baf82c
(kgdb)
0xffffffe328baf7b0: 0xffffffe4144e0000
(kgdb)
0xffffffe328baf7b8: 0x20
(kgdb)
0xffffffe328baf7c0: 0xfffffffffe4144e0
(kgdb)
0xffffffe328baf7c8: 0xffffffe414500000
(kgdb)
0xffffffe328baf7d0: 0x0
(kgdb)
0xffffffe328baf7d8: 0xffffffff8140f548 <map_entry_init+258024>
(kgdb)
0xffffffe328baf7e0: 0xffffffe328baf810
(kgdb)
0xffffffe328baf7e8: 0xffffffff81395100 <kernel_map>
(kgdb)
0xffffffe328baf7f0: 0xffffffff81395188 <kernel_map+136>
(kgdb)
0xffffffe328baf7f8: 0xffffffe414500000
(kgdb)
0xffffffe328baf800: 0xffffffe4144e0000
(kgdb)
0xffffffe328baf808: 0xffffffff81613000 <CPU_prvspace_bsp>
(kgdb)
0xffffffe328baf810: 0xffffffe328baf850
(kgdb)
0xffffffe328baf818: 0xffffffff8085690d <vm_map_remove+105>
(kgdb)
0xffffffe328baf820: 0x189d
(kgdb)
0xffffffe328baf828: 0x228baf8b0
(kgdb)
0xffffffe328baf830: 0xffffffff82256a60 <M_ACPITASK>
(kgdb)
0xffffffe328baf838: 0x58
(kgdb)
0xffffffe328baf840: 0xffffffe4144e0000
(kgdb)
0xffffffe328baf848: 0xffffffff81613000 <CPU_prvspace_bsp>
(kgdb)
0xffffffe328baf850: 0xffffffe328baf860
(kgdb)
0xffffffe328baf858: 0xffffffff80602e74 <kmem_slab_free+43>
(kgdb)
0xffffffe328baf860: 0xffffffe328baf8c0
(kgdb)
0xffffffe328baf868: 0xffffffff80603683 <kmalloc+530>
(kgdb)
0xffffffe328baf870: 0xffffffe0380f8808
(kgdb)
0xffffffe328baf878: 0xffffffe0380f8814
(kgdb)
0xffffffe328baf880: 0xffffffe328baf8b0
(kgdb)
0xffffffe328baf888: 0x130282226b6a
(kgdb)
0xffffffe328baf890: 0xffffffff82259298 <AcpiGbl_XPm1aEnable>
(kgdb)
0xffffffe328baf898: 0x2
(kgdb)
0xffffffe328baf8a0: 0xffffffe338cd8a90
(kgdb)
0xffffffe328baf8a8: 0xffffffff8221b962 <AcpiEvAsynchExecuteGpeMethod>
(kgdb)
0xffffffe328baf8b0: 0x66
(kgdb)
0xffffffe328baf8b8: 0x990
(kgdb)
0xffffffe328baf8c0: 0xffffffe328baf8f0
(kgdb)
0xffffffe328baf8c8: 0xffffffff8224db6e <AcpiOsExecute+61>
(kgdb)
0xffffffe328baf8d0: 0xffffffe338cd8a90
(kgdb)
0xffffffe328baf8d8: 0xffffffe338cd8a90
(kgdb)
0xffffffe328baf8e0: 0x66
(kgdb)
0xffffffe328baf8e8: 0xffffffe037f23bf8
(kgdb)
0xffffffe328baf8f0: 0xffffffe328baf920
(kgdb)
0xffffffe328baf8f8: 0xffffffff8221bb32 <AcpiEvGpeDispatch+241>
(kgdb)
0xffffffe328baf900: 0xffffffe328baf968
(kgdb)
0xffffffe328baf908: 0xffffffe338cd8100
(kgdb)
0xffffffe328baf910: 0x6
(kgdb)
0xffffffe328baf918: 0x40
(kgdb)
0xffffffe328baf920: 0xffffffe328baf9a0
(kgdb)
0xffffffe328baf928: 0xffffffff8221bd07 <AcpiEvGpeDetect+371>
(kgdb)
0xffffffe328baf930: 0x0
(kgdb)
0xffffffe328baf938: 0x0
(kgdb)
0xffffffe328baf940: 0xc
(kgdb)
0xffffffe328baf948: 0xffffffe0380f8808
(kgdb)
0xffffffe328baf950: 0xffffffe328bf35e0
(kgdb)
0xffffffe328baf958: 0xffffffe037f23bf8
(kgdb)
0xffffffe328baf960: 0xffffffe1e105ea00
(kgdb)
0xffffffe328baf968: 0x4000000046
(kgdb)
0xffffffe328baf970: 0xffffffe1e105ef80
(kgdb)
0xffffffe328baf978: 0x0
(kgdb)
0xffffffe328baf980: 0xffffffe037ec42e0
(kgdb)
0xffffffe328baf988: 0xffffffff81613000 <CPU_prvspace_bsp>
(kgdb)
0xffffffe328baf990: 0x0
(kgdb)
0xffffffe328baf998: 0xffffffe1e105f178
(kgdb)
0xffffffe328baf9a0: 0xffffffe328baf9c0
(kgdb)
0xffffffe328baf9a8: 0xffffffff8221dc55 <AcpiEvSciXruptHandler+25>
(kgdb)
0xffffffe328baf9b0: 0xffffffe037f02fc0
(kgdb)
0xffffffe328baf9b8: 0x0
(kgdb)
0xffffffe328baf9c0: 0xffffffe328baf9d0
(kgdb)
0xffffffe328baf9c8: 0xffffffff8224d4ab <InterruptWrapper+30>
(kgdb)
0xffffffe328baf9d0: 0xffffffe328bafab0
(kgdb)
0xffffffe328baf9d8: 0xffffffff805d8cef <ithread_handler+592>
(kgdb)
0xffffffe328baf9e0: 0x0
(kgdb)
0xffffffe328baf9e8: 0xffffffe1e0955288
(kgdb)
0xffffffe328baf9f0: 0xffffffe1e0955288
(kgdb)
0xffffffe328baf9f8: 0x9
(kgdb)
0xffffffe328bafa00: 0x0
(kgdb)
0xffffffe328bafa08: 0x288
(kgdb)
0xffffffe328bafa10: 0x0
(kgdb)
0xffffffe328bafa18: 0xffffffe1e0955000
(kgdb)
0xffffffe328bafa20: 0x90000b90c
(kgdb)
0xffffffe328bafa28: 0x1000004b6
(kgdb)
0xffffffe328bafa30: 0x0
(kgdb)
0xffffffe328bafa38: 0x0
(kgdb)
0xffffffe328bafa40: 0x0
(kgdb)
0xffffffe328bafa48: 0x0
(kgdb)
0xffffffe328bafa50: 0x0
(kgdb)
0xffffffe328bafa58: 0x0
(kgdb)
0xffffffe328bafa60: 0x0
(kgdb)
0xffffffe328bafa68: 0x0
(kgdb)
0xffffffe328bafa70: 0x0
(kgdb)
0xffffffe328bafa78: 0x0
(kgdb)
0xffffffe328bafa80: 0xffffffff80fc0240 <thread0>
(kgdb)
0xffffffe328bafa88: 0xffffffff80fc0240 <thread0>
(kgdb)
0xffffffe328bafa90: 0xffffffff81613000 <CPU_prvspace_bsp>
(kgdb)
0xffffffe328bafa98: 0xffffffe328bafac0
(kgdb)
0xffffffe328bafaa0: 0x0
(kgdb)
0xffffffe328bafaa8: 0xffffffe1e0955000
(kgdb)
0xffffffe328bafab0: 0x0
(kgdb)
0xffffffe328bafab8: 0xffffffff806196f3 <lwkt_exit>
(kgdb)
0xffffffe328bafac0: 0x0
(kgdb)
0xffffffe328bafac8: 0x0
(kgdb)
0xffffffe328bafad0: 0x0
(kgdb)
0xffffffe328bafad8: 0x0
(kgdb)
0xffffffe328bafae0: 0x0
(kgdb)
0xffffffe328bafae8: 0x0
(kgdb)
0xffffffe328bafaf0: 0x0
(kgdb)
0xffffffe328bafaf8: 0x0
(kgdb)
0xffffffe328bafb00: 0x0
(kgdb)
0xffffffe328bafb08: 0x0
(kgdb)
0xffffffe328bafb10: 0x0
(kgdb)
0xffffffe328bafb18: 0x0
(kgdb)
0xffffffe328bafb20: 0x9
(kgdb)
0xffffffe328bafb28: 0x0
(kgdb)
0xffffffe328bafb30: 0x0
(kgdb)
0xffffffe328bafb38: 0xffffffff805d8a9f <ithread_handler>
(kgdb)
0xffffffe328bafb40: 0x0
(kgdb)
---

#3 Updated by peeter 8 months ago

Just to add that it does not seem to be a crash that can be easily reproduced. I rebooted and made buildworld/kernel with no problems. Prior to crash the machine had been mostly idle overnight (uptime ~13 hrs) running desktop and a few apps.

#4 Updated by peeter 8 months ago

Tried to follow a few steps from pcpu_timer_process onwards (could be wrong though):

---
sys/platform/pc64/x86_64/ipl.s: 453: call pcpu_timer_process

sys/platform/pc64/x86_64/ipl.s: 359: jnz splz_timer

sys/platform/pc64/x86_64/ipl.s: 345: ENTRY(splz)

sys/kern/lwkt_thread.c: 1114: lwkt_user_yield(void): 1124: splz();
---

#5 Updated by dillon 8 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 80

Excellent backtrace, I believe I found the issue and I committed a fix to both master and the release branch that should work around the problem.

The issue is that the ACPI code is issuing a kmalloc() while holding a spinlock. This is illegal, but I couldn't immediately fix it (I have to research it a bit more). However, the particular panic is from a yield via that path (kmalloc -> issues any prior staged frees -> vm system -> lwkt_user_yield), and I can detect the problem in the yield and return rather than yield for now.

-Matt

#6 Updated by peeter 8 months ago

Many thanks---good news it can be fixed!

Also available in: Atom PDF