Bug #557

1.9.0-DEVELOPMENT hangs with ACPI enabled

Added by marianitn almost 8 years ago. Updated almost 8 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi,
I've upgraded my system to HEAD as of Feb 05, and when I boot with ACPI
enabled, system hangs at the point shown on attached dmesg.
I also tried 1.8-RELEASE but my system hangs and I can't even see the output
because the screen turns off.
I followed instructions on
http://leaf.dragonflybsd.org/mailarchive/commits/2007-01/msg00269.html for
building acpi.ko with more debugging info.
HTH. I can provide more info if needed.

Regards

dmesg.acpi (56.3 KB) marianitn, 02/12/2007 09:53 PM

dmesg.txt Magnifier (5.27 KB) marianitn, 02/14/2007 11:55 AM

acpi-nocst.diff Magnifier (1.04 KB) qhwt+dfly, 02/15/2007 09:23 AM

acpi-test.diff Magnifier (2.79 KB) qhwt+dfly, 02/20/2007 11:00 PM

acpi-test-2.diff Magnifier (2.58 KB) qhwt+dfly, 02/24/2007 02:51 PM

dmesg.knob5.gz (6.3 KB) marianitn, 02/26/2007 01:00 PM

acpi-test-3.diff Magnifier (3.52 KB) qhwt+dfly, 02/28/2007 11:58 AM

acpi-test-4.diff Magnifier (2.41 KB) qhwt+dfly, 03/06/2007 12:42 PM

History

#1 Updated by qhwt+dfly almost 8 years ago

"hang" means that neither ctrl+alt+esc nor ctrl+alt+del works?

That sounds new...

ACPI_ALL_DRIVERS only enables debugging output from OS implementation
and not in ACPI-CA components, but before using ACPI_ALL_COMPONENTS,
you might want to narrow down the components having problems in the
new ACPI driver (or the screen gets flooded with function traces).

Does any of the following command in the boot loader make the boot
proceed any further?
set debug.acpi.disabled="ec"
set debug.acpi.disabled="cmbat"
set debug.acpi.disabled="acad"
:

It's a space-delimited list of sub-driver names mentioned in the output
from "man acpi | less +/sub-".

BTW, what was the manufacturer and the model of your mainboard/PC/laptop?

Cheers.

#2 Updated by marianitn almost 8 years ago

Hi,

Yes, neither ctrl+alt+esc nor ctrl+alt+del works.

Yes, I've tried them all. Whith 3 of them the system boots OK: bus,
children and cpu.

My PC's mainboard is ASUS P4V8X-X. I attach dmesg with
debug.acpi.disabled="cpu" for more info.

#3 Updated by qhwt+dfly almost 8 years ago

"bus" or "children" disables "cpu", so it must be "cpu" :)

A-ha, I think there's a relavant information WRT your mainboard:
http://lkml.org/lkml/2005/12/5/263

Can you try attached patch and set the variable "debug.acpi.cpu.nocst"
in the boot loader? Also if you have time, can you try recent FreeBSD
and if it boots OK.

Cheers.

#4 Updated by qhwt+dfly almost 8 years ago

.. I wanted to say "You have to write down the last few lines by hand
(or take a picture of it) unless the message buffer survives across
reboot".

#5 Updated by qhwt+dfly almost 8 years ago

Well, we're trying to find where it's hanging, so we can't get something
really useful without hanging your PC, unless the message buffer survives
across reboot(some does). Roughly there are three places where the hang
can occur; probe, attach, and the idle hook, and we need to determine
where it is hanging. If you remove "ACPI_NAMESPACE" from debug.acpi.layer
and reboot without disabling "cpu" subsystem, do you see a few lines
containing "cpu", "CPU", or "Processor" on the console? If it does,
let me know them.

Cheers.

#6 Updated by qhwt+dfly almost 8 years ago

Then I spoke too soon :). It appears to be after the import of
ACPICA-20061109. Can you try this in the boot loader:
set debug.acpi.layer="ACPI_PROCESSOR ACPI_NAMESPACE"
set debug.acpi.level="ACPI_LV_INFO ACPI_LV_FUNCTIONS"

and show me the last few messages from the driver? Also, can you
send me privately the output of the following command:
# acpidump -dt > P4V8X-X.asl

Cheers.

#7 Updated by marianitn almost 8 years ago

Yes, I know. But the problem is that nothing different happens setting
debug.acpi.layer and debug.acpi.level. The system just hangs at the
same point (ppi0: <Parallel I/O> on ppbus0) and without any new
messages. And when it hangs, I can't even scroll up to see if
something different shows on boot messages.
I also tried removing "ACPI_NAMESPACE" from debug.acpi.layer, but
nothing different happens.

Regards.-

#8 Updated by qhwt+dfly almost 8 years ago

You can use more verbose options found in acpi(4), but since you can't
scroll up once it hangs, you don't want to turn on ALL options yet.
I wrote another patch to find the last function call before it locks up;
please apply attached patch after reverting the first one if you still
have that, and tell me the last dmesg line which looks like this
(the number varies):
HERE: acpi_cpu_attach:206
In this case, it means that the driver locked up following the line 206 of
acpi_cpu.c(the name before the number is the function name, and it resides
in acpi_cpu.c).

And the next thing I'm going to tell you is to add two lines in the
acpi_cpu.c like this:

kprintf("HERE: %s:%d\n", __func__, __LINE__); /* <== line 206 */
AcpiDbgLayer = ACPI_ALL_COMPONENTS;
AcpiDbgLevel = ACPI_LV_ALL;
status = AcpiEvaluateObject(handle, NULL, NULL, &buf); /* <== possibly offending statement */
if (ACPI_FAILURE(status)) {

Probably ACPI_ALL_COMPONENTS displays too much messages, so we need
to reduce it to a few component names. But before doing that, we need
to locate the offending line in acpi_cpu.c.

Cheers.

#9 Updated by marianitn almost 8 years ago

OK, the last line was 219.

I added those two lines below 219, but after that the screen was
flooded with messages for more than 10 minutes, and I can't see/take a
picture of the last lines. So I tried with AcpiDbgLayer =
ACPI_PROCESSOR, and took a picture of the last messages of acpi driver
before . You can find it here
http://www.e-nix.com.ar/~maliaga/ACPI_PROCESSOR.jpg
I also tried with AcpiDbgLayer = ACPI_NAMESPACE, but the screen was
flooded again.

HTH

Regards.-

#10 Updated by qhwt+dfly almost 8 years ago

Really? The lines after 219 until the next kprintf inserted look
like this, right?

219 kprintf("HERE: %s:%d\n", __func__, __LINE__);
220
221 /*
222 * Find the processor associated with our unit. We could use the
223 * ProcId as a key, however, some boxes do not have the same values
224 * in their Processor object as the ProcId values in the MADT.
225 */
226 acpi_id = obj->Processor.ProcId;
227 AcpiOsFree(obj);
228 if (acpi_pcpu_get_id(device_get_unit(dev), &acpi_id, &cpu_id) != 0)
229 return (ENXIO);
230
231 kprintf("HERE: %s:%d\n", __func__, __LINE__);

Hmm.. what happens if you undo the patch, and change above code to:
%%%%
acpi_id = obj->Processor.ProcId;
kprintf("Before calling AcpiOsFree()\n");
AcpiOsFree(obj);
kprintf("After calling AcpiOsFree()\n");
if (acpi_pcpu_get_id(device_get_unit(dev), &acpi_id, &cpu_id) != 0) {
kprintf("acpi_pcpu_get_id() failed\n");
return (ENXIO);
}
kprintf("acpi_id: %d, cpu_id: %d\n", acpi_id, cpu_id);

%%%%
without debug.acpi.layer or debug.acpi.level in /boot/loader.conf.

BTW, to specify "all ACPI-CA components except for namespace and processor",
you can change it to:

AcpiDbgLayer = ACPI_ALL_COMPONENTS & ~(ACPI_NAMESPACE | ACPI_PROCESSOR);

But I wrote that part in the previous message in an assumption that the
lock-up was during one of the calls to AcpiEvaluateObject().

Cheers.

#11 Updated by marianitn almost 8 years ago

Yes, those are the lines.

This is what I get: http://www.e-nix.com.ar/~maliaga/00002.jpg

Thanks for this, I didn't know how to do it.

Regards.-

#12 Updated by qhwt+dfly almost 8 years ago

Probably I should've asked earlier; if you booted FreeBSD, or DragonFly
built from source code before import of new ACPI-CA, do you have
hw.acpi.cpu sysctl knob? The picture is somewhat overwrapped, but it
seems that acpi_pcpu_get_id() failed for both units. You saw 4
"acpi_pcpu_get_id() failed" message, right? I think the lock-up occurred
somewhere other than acpi_cpu_probe(), after returning from it, but
whether it locks up or not depends on what acpi_cpu_probe() does.

I wrote another patch to find it. Please apply attached patch after
reverting the previous change, and install the driver. Boot into the
boot loader prompt, and type(without other debug.acpi variables set)
the following two lines to boot into single-user mode:
set debug.acpi.cpu.knob=5
boot -sv
if it locks up, write down(or take a picture of) lines containing "ProcId",
which looks like this:
cpu0: ProcId 1: acpi_id=0, cpu_id=0
or
cpu0: ProcId 1: can't find cpu_id

Then reboot into the boot loader, and this time try 4 instead of 5.
Repeat this until it stops locking up (or when the number reached to 0),
and tell me the number.
(you don't need to write down the "ProcId" line; only the first time).

Cheers.

#13 Updated by marianitn almost 8 years ago

Yes, "acpi_pcpu_get_id() failed" appears 4 times.

With debug.acpi.cpu.knob=5 I get 2 of this messages (dmesg attached),
and the system doesn't lock up. I tried 4, 3, 2, 1 just in case, but
Procid messages don't appear and the system also doesn't lock up.

Regards.-

#14 Updated by qhwt+dfly almost 8 years ago

Does this reliably reproduce? I mean it always yields the same result,
across reboot or after power-cycle? I see 6 ProcId lines in dmesg output,
not 4, and none of them failed, unlike it did in the last picture.

If it does, please try yet another patch attached to this message(after
reverting the last patches). Also please make sure that the files in
/sys/dev/acpica5 or /sys/platform/pc32/acpica5 are up to date before
rebuilding the acpi driver. This time you are going to try 6 to 11 for
debug.acpi.cpu.knob and see when it locks up. If I'm not mistaken,
it should lock up before reaching to 11.

Cheers.

#15 Updated by marianitn almost 8 years ago

Yes, it's always the same result: 6 ProcId lines. BUT, I wasn't sure
about how many times "acpi_pcpu_get_id() failed" appeared on previous
test. So I reverted changes on acpi_cpu.c and modified these lines as
before:

%%%%
acpi_id = obj->Processor.ProcId;
kprintf("Before calling AcpiOsFree()\n");
AcpiOsFree(obj);
kprintf("After calling AcpiOsFree()\n");
if (acpi_pcpu_get_id(device_get_unit(dev), &acpi_id, &cpu_id) != 0) {
kprintf("acpi_pcpu_get_id() failed\n");
return (ENXIO);
}
kprintf("acpi_id: %d, cpu_id: %d\n", acpi_id, cpu_id);

%%%%

and started taking pictures to get a better shot. After watching them,
I saw that "acpi_pcpu_get_id() failed" appears only 3 times
(http://www.e-nix.com.ar/~maliaga/00003.jpg and
(http://www.e-nix.com.ar/~maliaga/00004.jpg). So sorry, I thought they
were 4 :-(

For this I cvsupped last changes, applied your patch, and tried from
knob 6 to 11. Unfortunately, it didn't lock on any of the tests, and
the messages where the same than previous tests with knob 0 to 5.

Regards.-

#16 Updated by qhwt+dfly almost 8 years ago

That's OK, debugging without a working keyboard is hard anyway :)

Hmm... have you tried without setting debug.acpi.cpu.knob? If I remember
correctly, the function trace for ACPI_PROCESSOR didn't show a line for
acpi_cpu_attach(), so I believed that it locked up before reaching to
attachment routine. But if you forced to return at the entrance of
attachment routine(with knob 11) and it doesn't lock up. That confuses me.
Anyway, I wrote another patch to narrow down the location. Please undo
any changes to files under /sys/dev/acpica5, apply the patch attached
to this message, and reinstall the driver. Now you have knobs 1 to 5.
This time, please add the following two lines in /boot/loader.conf:
debug.acpi.layer="ACPI_PROCESSOR"
debug.acpi.layer="ACPI_LV_INFO"

and let me know lines in dmesg output beginning with "acpi_cpu"
if knob=4 didn't lock up your computer.

Cheers.

#17 Updated by qhwt+dfly almost 8 years ago

Oops, sorry.

#18 Updated by marianitn almost 8 years ago

Resolved. YONETANI's commit on acpi_cpu.c fixed the problem for me. Thank you!

Also available in: Atom PDF