Bug #568

ACPI(?) double-free on shutdown (More K9AGM fun)

Added by floid over 7 years ago. Updated over 4 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

This was happening with my 1.8 kernel and continues with 1.9; doing a reboot or
shutdown -r produces it consistently.

100_0473.jpg (76.7 KB) floid, 03/02/2007 01:20 PM

100_0474.jpg (84.8 KB) floid, 03/02/2007 01:20 PM

1.9.oldata.smp.dmesg.boot (51.3 KB) floid, 03/03/2007 07:57 PM

UP_Panic_1.jpg (66.5 KB) floid, 03/04/2007 11:02 PM

UP_Panic_2.jpg (78.3 KB) floid, 03/04/2007 11:02 PM

1.9.up.dmesg.boot (24.8 KB) floid, 03/04/2007 11:07 PM

Patch_Panic_1.png (249 KB) floid, 03/08/2007 02:24 AM

Patch_Panic_2.png (225 KB) floid, 03/08/2007 02:24 AM

double-free.patch Magnifier (4.65 KB) qhwt+dfly, 03/08/2007 11:12 PM

3-08-2007 qwht patch panic 1.png (227 KB) floid, 03/11/2007 03:29 AM

3-08-2007 qwht patch panic 2.png (233 KB) floid, 03/11/2007 03:29 AM

debug-cache.diff.gz (1.88 KB) qhwt+dfly, 03/11/2007 06:19 AM

debug-cache.diff.gz (1.94 KB) qhwt+dfly, 03/11/2007 09:46 AM

History

#1 Updated by floid over 7 years ago

(second page of panic uploaded)

#2 Updated by qhwt+dfly over 7 years ago

Please upload (or send me privately) gzip'ed /var/run/dmesg.boot
from boot -v (drop into the boot loader and type `boot -v').
Do you experience the same panic with a UP kernel and ACPI driver
without 'CFLAGS="-DSMP=1"'?

Cheers.

#3 Updated by floid over 7 years ago

SMP dmesg uploaded to the bug-tracker.

The dirty filesystem is from a lazy press of the reset button and has nothing to
do with this bug.

I'll try to check with a UP kernel soon.

-Thanks,
-Floid

#4 Updated by floid over 7 years ago

Uniprocessor dmesg, and screenshots of it panicking too, now present at
http://bugs.dragonflybsd.org/issue568 .

-Floid

#5 Updated by dillon over 7 years ago

:SMP dmesg uploaded to the bug-tracker.
:
:The dirty filesystem is from a lazy press of the reset button and has nothi=
:ng to
:do with this bug.
:
:I'll try to check with a UP kernel soon.
:
:-Thanks,
:-Floid

Hmm. ACPI is probably calling AcpiOsReleaseObject() twice for the same
object sometime prior to the call to AcpiOsDeleteCache().

Lets see if we can catch it in the act. Try this patch and get a
backtrace when you get the (hopefully new) panic.

If we can catch it in the act earlier and get a good backtrace we may
have enough information to fix the bug or, if messing with the contrib
code is too messy, just ignore the error instead of panicing.

-Matt

Index: dev/acpica5/Osd/OsdCache.c
===================================================================
RCS file: /cvs/src/sys/dev/acpica5/Osd/OsdCache.c,v
retrieving revision 1.2
diff -u -r1.2 OsdCache.c
--- dev/acpica5/Osd/OsdCache.c 19 Jan 2007 23:58:53 -0000 1.2
+++ dev/acpica5/Osd/OsdCache.c 6 Mar 2007 16:52:41 -0000
@@ -39,6 +39,20 @@
struct objcache_malloc_args args;
};

+/*
+ * Add some magic numbers to catch double-frees earlier rather
+ * then later.
+ */
+struct acpiobjhead {
+ int state;
+ int unused;
+};
+
+#define TRACK_ALLOCATED 0x7AF45533
+#define TRACK_FREED 0x7B056644
+
+#define OBJHEADSIZE sizeof(struct acpiobjhead)
+
#include "acpi.h"

#ifndef ACPI_USE_LOCAL_CACHE
@@ -50,7 +64,7 @@
ACPI_CACHE_T *cache;

cache = kmalloc(sizeof(*cache), M_TEMP, M_WAITOK);
- cache->args.objsize = ObjectSize;
+ cache->args.objsize = OBJHEADSIZE + ObjectSize;
cache->args.mtype = M_CACHE;
cache->cache = objcache_create(CacheName, 0, 0, NULL, NULL,
NULL, objcache_malloc_alloc, objcache_malloc_free, &cache->args);
@@ -79,17 +93,29 @@
void *
AcpiOsAcquireObject(ACPI_CACHE_T *Cache)
{
+ struct acpiobjhead *head;
void *Object;

- Object = objcache_get(Cache->cache, M_WAITOK);
- bzero(Object, Cache->args.objsize);
- return Object;
+ head = objcache_get(Cache->cache, M_WAITOK);
+ bzero(head, Cache->args.objsize);
+ head->state = TRACK_ALLOCATED;
+ return (head + 1);
}

ACPI_STATUS
AcpiOsReleaseObject(ACPI_CACHE_T *Cache, void *Object)
{
- objcache_put(Cache->cache, Object);
+ struct acpiobjhead *head = (void *)((char *)Object - OBJHEADSIZE);
+
+ if (head->state != TRACK_ALLOCATED) {
+ if (head->state == TRACK_FREED)
+ panic("AcpiOsReleaseObject: Double Free %p (%08x)", Object, head->state);
+ else
+ panic("AcpiOsReleaseObject: Bad object %p (%08x)", Object, head->state);
+ }
+ head->state = TRACK_FREED;
+
+ objcache_put(Cache->cache, head);
return AE_OK;
}

#6 Updated by floid over 7 years ago

Looks like the patch was a winner; what you've won, I'm not experienced (or
awake) enough to figure out.

http://bugs.dragonflybsd.org/file213/Patch_Panic_1.png and
http://bugs.dragonflybsd.org/file214/Patch_Panic_2.png ,

hopefully the PNGs are less eye-searingly blurry than the JPGs were, and
hopefully I'll have a working serial header within a week or two.

-Floid

#7 Updated by dillon over 7 years ago

:Joe "Floid" Kanowitz <> added the comment:
:
:Looks like the patch was a winner; what you've won, I'm not experienced (or
:awake) enough to figure out.
:
:http://bugs.dragonflybsd.org/file213/Patch_Panic_1.png and
:http://bugs.dragonflybsd.org/file214/Patch_Panic_2.png ,
:
:hopefully the PNGs are less eye-searingly blurry than the JPGs were, and
:hopefully I'll have a working serial header within a week or two.
:
:-Floid

Yah. It looks like a reference count problem in the ACPI contrib code,
and it doesn't look like it will be easy to find.

The ACPI code is clearly reusing freed memory so it may panic in
other ways, but we can try just letting this error slide and see if
that stops your panics. Here's a patch to try.

-Matt
Matthew Dillon
<>

Index: Osd/OsdCache.c
===================================================================
RCS file: /cvs/src/sys/dev/acpica5/Osd/OsdCache.c,v
retrieving revision 1.2
diff -u -r1.2 OsdCache.c
--- Osd/OsdCache.c 19 Jan 2007 23:58:53 -0000 1.2
+++ Osd/OsdCache.c 8 Mar 2007 16:29:37 -0000
@@ -39,6 +39,20 @@
struct objcache_malloc_args args;
};

+/*
+ * Add some magic numbers to catch double-frees earlier rather
+ * then later.
+ */
+struct acpiobjhead {
+ int state;
+ int unused;
+};
+
+#define TRACK_ALLOCATED 0x7AF45533
+#define TRACK_FREED 0x7B056644
+
+#define OBJHEADSIZE sizeof(struct acpiobjhead)
+
#include "acpi.h"

#ifndef ACPI_USE_LOCAL_CACHE
@@ -50,7 +64,7 @@
ACPI_CACHE_T *cache;

cache = kmalloc(sizeof(*cache), M_TEMP, M_WAITOK);
- cache->args.objsize = ObjectSize;
+ cache->args.objsize = OBJHEADSIZE + ObjectSize;
cache->args.mtype = M_CACHE;
cache->cache = objcache_create(CacheName, 0, 0, NULL, NULL,
NULL, objcache_malloc_alloc, objcache_malloc_free, &cache->args);
@@ -79,17 +93,30 @@
void *
AcpiOsAcquireObject(ACPI_CACHE_T *Cache)
{
+ struct acpiobjhead *head;
void *Object;

- Object = objcache_get(Cache->cache, M_WAITOK);
- bzero(Object, Cache->args.objsize);
- return Object;
+ head = objcache_get(Cache->cache, M_WAITOK);
+ bzero(head, Cache->args.objsize);
+ head->state = TRACK_ALLOCATED;
+ return (head + 1);
}

ACPI_STATUS
AcpiOsReleaseObject(ACPI_CACHE_T *Cache, void *Object)
{
- objcache_put(Cache->cache, Object);
+ struct acpiobjhead *head = (void *)((char *)Object - OBJHEADSIZE);
+
+ if (head->state != TRACK_ALLOCATED) {
+ if (head->state == TRACK_FREED)
+ printf("AcpiOsReleaseObject: Double Free %p (%08x)\n", Object, head->state);
+ else
+ printf("AcpiOsReleaseObject: Bad object %p (%08x)\n", Object, head->state);
+ return AE_OK;
+ }
+ head->state = TRACK_FREED;
+
+ objcache_put(Cache->cache, head);
return AE_OK;
}

#8 Updated by dillon over 7 years ago

Oops, here is a new patch. I forgot the 'k' in the 'kprintf' in the
last one.

-Matt

Index: dev/acpica5/Osd/OsdCache.c
===================================================================
RCS file: /cvs/src/sys/dev/acpica5/Osd/OsdCache.c,v
retrieving revision 1.2
diff -u -r1.2 OsdCache.c
--- dev/acpica5/Osd/OsdCache.c 19 Jan 2007 23:58:53 -0000 1.2
+++ dev/acpica5/Osd/OsdCache.c 8 Mar 2007 16:49:31 -0000
@@ -39,6 +39,20 @@
struct objcache_malloc_args args;
};

+/*
+ * Add some magic numbers to catch double-frees earlier rather
+ * then later.
+ */
+struct acpiobjhead {
+ int state;
+ int unused;
+};
+
+#define TRACK_ALLOCATED 0x7AF45533
+#define TRACK_FREED 0x7B056644
+
+#define OBJHEADSIZE sizeof(struct acpiobjhead)
+
#include "acpi.h"

#ifndef ACPI_USE_LOCAL_CACHE
@@ -50,7 +64,7 @@
ACPI_CACHE_T *cache;

cache = kmalloc(sizeof(*cache), M_TEMP, M_WAITOK);
- cache->args.objsize = ObjectSize;
+ cache->args.objsize = OBJHEADSIZE + ObjectSize;
cache->args.mtype = M_CACHE;
cache->cache = objcache_create(CacheName, 0, 0, NULL, NULL,
NULL, objcache_malloc_alloc, objcache_malloc_free, &cache->args);
@@ -79,17 +93,30 @@
void *
AcpiOsAcquireObject(ACPI_CACHE_T *Cache)
{
+ struct acpiobjhead *head;
void *Object;

- Object = objcache_get(Cache->cache, M_WAITOK);
- bzero(Object, Cache->args.objsize);
- return Object;
+ head = objcache_get(Cache->cache, M_WAITOK);
+ bzero(head, Cache->args.objsize);
+ head->state = TRACK_ALLOCATED;
+ return (head + 1);
}

ACPI_STATUS
AcpiOsReleaseObject(ACPI_CACHE_T *Cache, void *Object)
{
- objcache_put(Cache->cache, Object);
+ struct acpiobjhead *head = (void *)((char *)Object - OBJHEADSIZE);
+
+ if (head->state != TRACK_ALLOCATED) {
+ if (head->state == TRACK_FREED)
+ kprintf("AcpiOsReleaseObject: Double Free %p (%08x)\n", Object, head->state);
+ else
+ kprintf("AcpiOsReleaseObject: Bad object %p (%08x)\n", Object, head->state);
+ return AE_OK;
+ }
+ head->state = TRACK_FREED;
+
+ objcache_put(Cache->cache, head);
return AE_OK;
}

#9 Updated by qhwt+dfly over 7 years ago

Can you try attached patch? AcpiUtUpdateObjectReference() can leave
Object->CommonNotify.SystemNotify or Object->CommonNotify.DeviceNotify
non-NULL after they are (eventually) released by AcpiUtDeleteObjectDesc().
AcpiUtDeleteObjectDesc() can leak the object if it's not of type
ACPI_OPERAND_OBJECT, so it also may have to be converted to accept
(ACPI_OPERAND_OBJECT **).

Cheers.

#10 Updated by floid over 7 years ago

@ Tomokazu,

Unfortunately your patch doesn't seem to be changing the behavior.
http://bugs.dragonflybsd.org/file216/3-08-2007%20qwht%20patch%20panic%201.png and
http://bugs.dragonflybsd.org/file217/3-08-2007%20qwht%20patch%20panic%202.png
show the result with a SMP kernel built from today's sources (+ patch); UP did
basically the same thing with some other quirks. (I use gdm, and for some
inexplicable reason moving the mouse causes it to freeze with the UP kernel.)

@ Matt,

Since Tomokazu's patch showed up before I could test, I still haven't tried
yours and will do so next just to prove it. This bug obviously sucks for anyone
with a need to reboot remotely, but FWIW, #566 -- the NATA/SB600/?? problem --
is a much bigger annoyance to someone at the console.

-With continued thanks for your attention, both,
-Joe "Floid" Kanowitz

#11 Updated by qhwt+dfly over 7 years ago

I was under the impression that just going to single-user mode and
type `reboot' could trigger the panic, no?

Unfortunately I can't reproduce the same panic on both UP and SMP kernels
here. Can you try attached patch? This is a slightly modified version of
Matt's patch but can print the function and the line number of the previous
caller of AcpiOsReleaseCache(). To build the acpi driver, revert previous
patches, then
$ export ACPI_DEBUG=yes ACPI_DEBUG_CACHE=yes
(for SMP kernel, you also need "export CFLAGS='-O -pipe -DSMP=1'")
$ cd /sys/dev/acpica5
$ make cleandir; make cleandir
$ make -s obj && make -s depend && make -s
$ su
# make install && reboot

You don't need to try this for both UP and SMP kernels, as either of them
can trigger the panic. Note that this patch avoids the double-free panic
so you need to use `shutdown -p' or `shutdown -h' instead of `reboot'
(unless dmesg survives across reboot).

Cheers.

#12 Updated by qhwt+dfly over 7 years ago

I added one more code to examine that objects are put back to
the correct cache. Please try this one instead.

Cheers.

#13 Updated by qhwt+dfly over 7 years ago

Hello?

#14 Updated by floid over 7 years ago

--- YONETANI Tomokazu <> wrote:

Sorry for the huge delay -- I dearly want to test this, but family
issues and so on have been eating all my time. Hopefully I can get to
it this evening if nothing else dramatic comes up.

#15 Updated by qhwt+dfly over 7 years ago

That's OK, I'm glad you're still with DragonFly :). If you already have
revision 1.3 of Osd/OsdCache.c, the last patch won't apply anymore,
so please try the version I just committed instead.
$ sh
$ export ACPI_DEBUG=yes ACPI_DEBUG_CACHE=yes
$ cd /sys/dev/acpica5
$ cvs update -C
$ make cleandir; make cleandir
$ make -s obj && make -s depend && make -s
$ make install

Cheers.

#16 Updated by alexh almost 5 years ago

ACPI had a major overhaul a few months ago. Does this still happen?

Cheers,
Alex Hornung

Also available in: Atom PDF