Bug #2117

ACPI and/or bce(4) problem with 2.11.0.673.g0d557 on HP DL380 G6

Added by pauska almost 3 years ago. Updated almost 3 years ago.

Status:NewStart date:
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I got a standard HP Proliant DL380 G6 server with a built-in quad broadcom NIC.

2.10 didn't have the updated bcn drivers, so I installed the 2.11.0.673 snapshot
to get connectivity.

First, the ACPI error (also present in 2.10):
[ACPI Debug] String [0xB] "_TMP Method"

This message repeats 60 times every 10 minutes. I have no idea what it means,
googling for it only points me at a NetBSD discussion from 2009.

Secondly, the bcn driver (or perhaps atapci?):
interrupt total rate
sio2 0 0
sio0 0 0
acpi0 12125 0
bce0 1547359 26
bce1/atapci0 2293301893 39875 <-- ouch?
bce2 0 0
bce3 0 0
uhci0/ehci0 1 0
uhci2/uhci4 34 0
uhci1/uhci3 44 0
ciss0 267683 4
swi_siopoll 0 0
swi_cambio 267762 4
swi_vm 0 0
swi_taskq/swi_mp_taskq 25 0
Total 2295396926 39911

The weird part is that I dont have any ATA devices in use, there's only a CD-
rom. bcn1 isnt configured or marked up, only bcn0 is in use.

The deal breaker here is that I can't do anything disk intensive without getting
a crash. I tried updating pkgsrc yesterday, and here are two examples:

[snip]
* [new branch] dragonfly-2010Q3 -> origin/dragonfly-2010Q3
*** Signal 10
Stop in /usr.
[snip]

[snip]
* [new branch] master -> origin/master
Bus error (core dumped)
*** Error code 1
Stop in /usr.
[snip]

While getting these errors messages like this flooded dmesg:
intr 16 at 40001/40000 hz, livelocked limit engaged!
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 882/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
pid 34805 (git), uid 0: exited on signal 10 (core dumped)
intr 16 at 3225/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 751/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 765/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 795/20000 hz, livelock removed
[ACPI Debug] String [0xB] "_TMP Method"

I'm not familiar with debugging this, so please let me know if you need more
info. I can also put the server in the DMZ and give a developer SSH access if
needed.

dmesg.txt Magnifier (68.8 KB) pauska, 08/18/2011 10:49 AM

History

#1 Updated by pauska almost 3 years ago

I forgot to mention that there is a Raritan KVM attached wich emulates storage
devices (for mounting ISO's etc via the KVM client). Could this be the cause of
the atapci interrupts?

#2 Updated by pauska almost 3 years ago

bce, not bcn. Sorry :(

#3 Updated by pauska almost 3 years ago

Update: the interrupt storm went away after disabling S-ATA in the BIOS.

#4 Updated by pauska almost 3 years ago

Update2: The interrupt storm came back on irq16/bce0 after doing heavy downloads
via git.

intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 827/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 1223/20000 hz, livelock removed
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 40001/40000 hz, livelocked limit engaged!
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 2838/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 2234/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 2685/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 790/20000 hz, livelock removed
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 825/20000 hz, livelock removed
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
[ACPI Debug] String [0xB] "_TMP Method"
seg-fault ft=0002 ff=000c addr=0x7fffffbffff8 rip=0x4e406a pid=814 p_comm=git
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 1169/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 954/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
pid 814 (git), uid 0: exited on signal 10 (core dumped)
intr 16 at 551/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 760/20000 hz, livelock removed
[ACPI Debug] String [0xB] "_TMP Method"
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 772/20000 hz, livelock removed
intr 16 at 40001/40000 hz, livelocked limit engaged!
intr 16 at 911/20000 hz, livelock removed

interrupt total rate
irq3: sio2 0 0
irq4: sio0 0 0
irq9: acpi0 12113 12
irq16: bce0 1468203 1478
irq17: bce1/atapci0 0 0
irq18: bce2 0 0
irq19: bce3 0 0
irq20: uhci0/ehci0 1 0
irq22: uhci2/uhci4 1 0
irq23: uhci1/uhci3 1 0
irq28: ciss0 185332 186
irq192: swi_siopoll 0 0
irq195: swi_cambio 185396 186
irq196: swi_vm 0 0
irq197: swi_taskq/swi_mp_taskq 0 0
Total 1851047 1864

#5 Updated by pauska almost 3 years ago

Update3: Disabling ACPI in the boot loader made it even worse, now the NIC won't
come up (getting "bce0: Watchdog timeout occured, resetting!" repeatedly in the
console).

#6 Updated by pauska almost 3 years ago

Verbose boot dmesg attached

Also available in: Atom PDF