Bug #3127: DragonFly 5.1: panic: assertion "count & TOK_COUNTMASK" failed in _lwkt_reltokref at /usr/src/sys/kern/lwkt_token.c:410 - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #3127

closed

DragonFly 5.1: panic: assertion "count & TOK_COUNTMASK" failed in _lwkt_reltokref at /usr/src/sys/kern/lwkt_token.c:410

Added by peeter over 7 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

dillon

Category:

Target version:

Start date:

04/02/2018

Due date:

% Done:

Estimated time:

Description

It is DragonFly 5.1 and the panic seems to originate in the ACPI subsystem. It's the same machine as in an earlier bug report

http://bugs.dragonflybsd.org/issues/2934

where the bug was ACPI related too.

---
Version String: DragonFly v5.1.0.1072.g51a52-DEVELOPMENT

---
panic with 1 spinlocks held
panic: assertion "count & TOK_COUNTMASK" failed in _lwkt_reltokref at /usr/src/sys/kern/lwkt_token.c:410

---
(kgdb) #0 _get_mycpu () at ./machine/thread.h:69
#1 panic (
fmt=fmt@entry=0xffffffff80bb7298 "assertion \"%s\" failed in %s at %s:%u")
at /usr/src/sys/kern/kern_shutdown.c:861
#2 0xffffffff80611a80 in _lwkt_reltokref (td=0xfffff80223a39780,
ref=<optimized out>) at /usr/src/sys/kern/lwkt_token.c:410
#3 lwkt_relalltokens (td=td@entry=0xfffff80223a39780)
at /usr/src/sys/kern/lwkt_token.c:506
#4 0xffffffff805f8e1a in panic (
fmt=fmt@entry=0xffffffff80beed38 "lwkt_switch: still holding %d exclusive spinlocks!") at /usr/src/sys/kern/kern_shutdown.c:814
#5 0xffffffff8060eafc in lwkt_switch ()
at /usr/src/sys/kern/lwkt_thread.c:645
#6 0xffffffff80618498 in tsleep (
ident=ident@entry=0xffffffff817f0d00 <kernel_map+128>,
flags=flags@entry=1024, wmesg=<optimized out>, timo=timo@entry=0)
at /usr/src/sys/kern/kern_synch.c:716
#7 0xffffffff805e80b0 in lockmgr_exclusive (
lkp=lkp@entry=0xffffffff817f0d00 <kernel_map+128>, flags=flags@entry=2)
at /usr/src/sys/kern/kern_lock.c:382
#8 0xffffffff80901e39 in lockmgr (flags=2,
lkp=0xffffffff817f0d00 <kernel_map+128>) at /usr/src/sys/sys/lock.h:271
#9 vm_map_remove (map=0xffffffff817f0c80 <kernel_map>,
start=18446735302252429312, end=18446735302252560384)
at /usr/src/sys/vm/vm_map.c:3112
#10 0xffffffff805f393b in kmem_slab_free (ptr=<optimized out>,
size=<optimized out>) at /usr/src/sys/kern/kern_slaballoc.c:1670
#11 0xffffffff805f4827 in kmalloc (size=size@entry=88,
type=type@entry=0xffffffff81391e00 <M_ACPITASK>, flags=flags@entry=4866)
at /usr/src/sys/kern/kern_slaballoc.c:732
#12 0xffffffff8097356c in AcpiOsExecute (Type=Type@entry=OSL_GPE_HANDLER,
Function=Function@entry=0xffffffff80979e20 <AcpiEvAsynchExecuteGpeMethod>, Context=Context@entry=0xfffff8037ae17290)
at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:138
#13 0xffffffff80979fc6 in AcpiEvGpeDispatch (GpeDevice=0xfffff8007a3a3658,
GpeEventInfo=0xfffff8037ae17290, GpeNumber=102)
at /usr/src/sys/contrib/dev/acpica/source/components/events/evgpe.c:993
#14 0xffffffff8097a15c in AcpiEvGpeDetect (
GpeXruptList=GpeXruptList@entry=0xfffff8007a323ee0)
at /usr/src/sys/contrib/dev/acpica/source/components/events/evgpe.c:698
#15 0xffffffff8097c2d9 in AcpiEvSciXruptHandler (Context=0xfffff8007a323ee0)
at /usr/src/sys/contrib/dev/acpica/source/components/events/evsci.c:261
#16 0xffffffff80972de1 in InterruptWrapper (arg=0xfffff8007a323ee0)
at /usr/src/sys/dev/acpica/Osd/OsdInterrupt.c:158
#17 0xffffffff805c6619 in ithread_handler (arg=<optimized out>)
at /usr/src/sys/kern/kern_intr.c:899
#18 0xffffffff8060b710 in ?? () at /usr/src/sys/kern/lwkt_thread.c:1748
#19 0x0000000000000000 in ?? ()
---

Actions

Copy link

Updated by peeter over 7 years ago

Core dump also on leaf in crash/2018-21-MARCH_acpi.

Actions

Copy link

Updated by dillon over 7 years ago

Status changed from New to In Progress
Assignee set to dillon

How repeatable is this bug? It looks like ACPI is trying to hold a spin lock across a sleep. I think its a problem with AcpiOsAcquireLock() and friends so I put together a quick patch:

http://apollo.backplane.com/DFlyMisc/acpi07.patch

This is a bit of a problem because AcpiOsAcquireLock is supposed to be a spinlock, and can be used from the idle thread (which is not allowed to block). So the patch is a bit of a hack. But it might do the job.

-Matt

Actions

Copy link

Updated by peeter over 7 years ago

Many thanks---I'll apply the patch later tonight and report back.

It's not easily repeatable, i.e. I have no idea what exactly triggers it. But there seems to be a maximum time period of about two to four days in which the machine will almost certainly panic.

Actions

Copy link

Updated by peeter over 7 years ago

Hello, the machine has an uptime of four days now. I'd say the patch has made a difference. I'll report when it will have made 14 days of uptime---then we know for sure.

Actions

Copy link