Bug #352

SMP weirdness in 1.6.2-RELEASE

Added by wettoast over 7 years ago. Updated over 7 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Greetings, i am an experienced FreeBSD user, who decided to give DragonFly a
whirl. I wanted to do some bechmarks with SMP, so i cvsuped to the latest 1.6.2
source, did a buildworld, etc. and enabled SMP. However the system seems to
freeze when SMP is enabled, without any panic, while UP works just fine. More
accurately, the console is frozen, but i can still use ssh sometimes. Sometimes
it freezes before the login comes up, other times it comes up but seems to
function oddly, and it always freezes solid when issuing a reboot command. This
is what i mean by oddly:

WARNING: was not properly dismounted

^ no device name printed.

# time md5 -t
MD5 time trial. Digesting 100000 10000-byte blocks ... done
Digest = 766a2bb5d24bddae466c572bcabca3ee
Time = 0.000000 seconds
Speed = inf bytes/second

real 0m10.588s
user 0m0.000s
sys 0m0.000s

This is an older Dell poweredge 2400 system, which works just fine under Windows
Server 2003 and FreeBSD 6. I am using a GENERIC kernel with SMP enabled. I can
produce a dump and assist with debugging. Below are the system details.

---
DragonFly 1.6.2-RELEASE #1: Sun Oct 15 17:05:30 EDT 2006
:/usr/src/sys/compile/GENERIC i386

cvsuped on the ~14th.

Oct 15 17:41:07 dfly kernel: Copyright (c) 2003, 2004, 2005, 2006 The DragonFly
Project.
Oct 15 17:41:07 dfly kernel: Copyright (c) 1992-2003 The FreeBSD Project.
Oct 15 17:41:07 dfly kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989,
1991, 1992, 1993, 1994
Oct 15 17:41:07 dfly kernel: The Regents of the University of California. All
rights reserved.
Oct 15 17:41:07 dfly kernel: DragonFly 1.6.2-RELEASE #0: Sun Oct 15 16:45:36 EDT
2006
Oct 15 17:41:07 dfly kernel: :/usr/src/sys/compile/GENERIC
Oct 15 17:41:07 dfly kernel: TSC clock: 860884863 Hz, i8254 clock: 1193188 Hz
Oct 15 17:41:07 dfly kernel: CPU: Intel Pentium III (860.89-MHz 686-class CPU)
Oct 15 17:41:07 dfly kernel: Origin = "GenuineIntel" Id = 0x686 Stepping = 6
Oct 15 17:41:07 dfly kernel:
Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
Oct 15 17:41:07 dfly kernel: real memory = 536862720 (524280K bytes)
Oct 15 17:41:07 dfly kernel: avail memory = 508981248 (497052K bytes)
Oct 15 17:41:07 dfly kernel: Changing APIC ID for IO APIC #0 from 0 to 2 on chip
Oct 15 17:41:07 dfly kernel: Changing APIC ID for IO APIC #1 from 0 to 3 on chip
Oct 15 17:41:07 dfly kernel: Programming 16 pins in IOAPIC #0
Oct 15 17:41:07 dfly kernel: Programming 16 pins in IOAPIC #1
Oct 15 17:41:07 dfly kernel: DragonFly/MP: Multiprocessor motherboard
Oct 15 17:41:07 dfly kernel: cpu0 (BSP): apic id: 1, version: 0x00040011, at
0xfee00000
Oct 15 17:41:07 dfly kernel: cpu1 (AP): apic id: 0, version: 0x00040011, at
0xfee00000
Oct 15 17:41:07 dfly kernel: io0 (APIC): apic id: 2, version: 0x000f0011, at
0xfec00000
Oct 15 17:41:07 dfly kernel: io1 (APIC): apic id: 3, version: 0x000f0011, at
0xfec01000
Oct 15 17:41:07 dfly kernel: Preloaded elf kernel "/kernel.old" at 0xc074e000.
Oct 15 17:41:07 dfly kernel: Preloaded elf module "/modules/acpi.ko" at 0xc074e1ec.
Oct 15 17:41:07 dfly kernel: Pentium Pro MTRR support enabled
Oct 15 17:41:07 dfly kernel: md0: Malloc disk
Oct 15 17:41:07 dfly kernel: pcibios: BIOS version 2.10
Oct 15 17:41:07 dfly kernel: Using $PIR table, 11 entries at 0xc00fc320
Oct 15 17:41:07 dfly kernel: npx0: <math processor> on motherboard
Oct 15 17:41:07 dfly kernel: npx0: INT 16 interface
Oct 15 17:41:07 dfly kernel: Using MMX optimized bcopy/copyin/copyout
Oct 15 17:41:07 dfly kernel: acpi0: <DELL PE2400> on motherboard
Oct 15 17:41:07 dfly kernel: acpi0: Power Button (fixed)
Oct 15 17:41:07 dfly kernel: Warning: ACPI is disabling APM's device. You can't
run both
Oct 15 17:41:07 dfly kernel: acpi_timer0: <32-bit timer at 3.579545MHz> port
0x808-0x80b on acpi0
Oct 15 17:41:07 dfly kernel: cpu0: <ACPI CPU> on acpi0
Oct 15 17:41:07 dfly kernel: compare 0
Oct 15 17:41:07 dfly kernel: legacypci0 on motherboard
Oct 15 17:41:07 dfly kernel: pcib0: <ServerWorks NB6635 3.0LE host to PCI
bridge> on legacypci0
Oct 15 17:41:07 dfly kernel: pci0: <PCI bus> on pcib0
Oct 15 17:41:07 dfly kernel: pcib1: <PCI to PCI bridge (vendor=8086
device=0962)> at device 2.0 on pci0
Oct 15 17:41:07 dfly kernel: pci1: <PCI bus> on pcib1
Oct 15 17:41:07 dfly kernel: ahc0: <Adaptec aic7880 Ultra SCSI adapter> port
0xfc00-0xfcff mem 0xfcfff000-0xfcffffff irq 11 at device 6.0 on pci1
Oct 15 17:41:07 dfly kernel: aic7880: Ultra Single Channel A, SCSI Id=7, 16/253 SCBs
Oct 15 17:41:07 dfly kernel: aac0: <Dell PERC 2/Si> mem 0xf4000000-0xf7ffffff
irq 2 at device 2.1 on pci0
Oct 15 17:41:07 dfly kernel: aac0: i960RX 100MHz, 54MB cache memory, no battery
support
Oct 15 17:41:07 dfly kernel: aac0: Kernel 2.8-0, Build 6089, S/N 3801d0
Oct 15 17:41:07 dfly kernel: aac0: Supported
Options=2558<DATA64,HOSTTIME,WINDOW4GB,SOFTERR,SGMAP64>
Oct 15 17:41:07 dfly kernel: fxp0: <Intel 82559 Pro/100 Ethernet> port
0xecc0-0xecff mem 0xfe000000-0xfe0fffff,0xfe101000-0xfe101fff irq 10 at device
8.0 on pci0
Oct 15 17:41:07 dfly kernel: miibus0: <MII bus> on fxp0
Oct 15 17:41:07 dfly kernel: inphy0: <i82555 10/100 media interface> on miibus0
Oct 15 17:41:07 dfly kernel: inphy0: 10baseT, 10baseT-FDX, 100baseTX,
100baseTX-FDX, auto
Oct 15 17:41:07 dfly kernel: fxp0: MAC address: 00:b0:d0:ab:3e:8c
Oct 15 17:41:07 dfly kernel: pci0: <ATI model 4759 graphics accelerator> at 14.0
Oct 15 17:41:07 dfly kernel: isab0: <ServerWorks IB6566 PCI to ISA bridge> port
0x580-0x58f at device 15.0 on pci0
Oct 15 17:41:07 dfly kernel: isa0: <ISA bus> on isab0
Oct 15 17:41:07 dfly kernel: pcib2: <ServerWorks NB6635 3.0LE host to PCI
bridge> on legacypci0
Oct 15 17:41:07 dfly kernel: pci2: <PCI bus> on pcib2
Oct 15 17:41:07 dfly kernel: orm0: <Option ROMs> at iomem
0xc0000-0xc7fff,0xc8800-0xcc7ff on isa0
Oct 15 17:41:07 dfly kernel: pmtimer0 on isa0
Oct 15 17:41:07 dfly kernel: fdc0: <NEC 72065B or clone> at port
0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
Oct 15 17:41:07 dfly kernel: fdc0: FIFO enabled, 8 bytes threshold
Oct 15 17:41:07 dfly kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0
Oct 15 17:41:07 dfly kernel: ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0
Oct 15 17:41:07 dfly kernel: ata1 at port 0x170-0x177,0x376 irq 15 on isa0
Oct 15 17:41:07 dfly kernel: atkbdc0: <Keyboard controller (i8042)> at port
0x60,0x64 on isa0
Oct 15 17:41:07 dfly kernel: atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
Oct 15 17:41:07 dfly kernel: kbd0 at atkbd0
Oct 15 17:41:07 dfly kernel: psm0: <PS/2 Mouse> irq 12 on atkbdc0
Oct 15 17:41:07 dfly kernel: psm0: model IntelliMouse, device ID 3
Oct 15 17:41:07 dfly kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem
0xa0000-0xbffff on isa0
Oct 15 17:41:07 dfly kernel: sc0: <System console> at flags 0x100 on isa0
Oct 15 17:41:07 dfly kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Oct 15 17:41:07 dfly kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
Oct 15 17:41:07 dfly kernel: sio0: type 16550A
Oct 15 17:41:07 dfly kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
Oct 15 17:41:07 dfly kernel: sio1: type 16550A
Oct 15 17:41:07 dfly kernel: ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
Oct 15 17:41:07 dfly kernel: ppc0: Generic chipset (ECP/PS2/NIBBLE) in
COMPATIBLE mode
Oct 15 17:41:07 dfly kernel: ppc0: FIFO with 16/16/8 bytes threshold
Oct 15 17:41:07 dfly kernel: ppbus0: <Parallel port bus> on ppc0
Oct 15 17:41:07 dfly kernel: plip0: <PLIP network interface> on ppbus0
Oct 15 17:41:07 dfly kernel: lpt0: <Printer> on ppbus0
Oct 15 17:41:07 dfly kernel: lpt0: Interrupt-driven port
Oct 15 17:41:07 dfly kernel: ppi0: <Parallel I/O> on ppbus0
Oct 15 17:41:07 dfly kernel: APIC_IO: Testing 8254 interrupt delivery
Oct 15 17:41:07 dfly kernel: APIC_IO: Broken MP table detected: 8254 is not
connected to IOAPIC #0 intpin 2
Oct 15 17:41:07 dfly kernel: APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
Oct 15 17:41:07 dfly kernel: SMP: AP CPU #1 Launched!
Oct 15 17:41:07 dfly kernel: Warning: ACPI idle hook not yet supported for SMP
Oct 15 17:41:07 dfly kernel: Waiting 5 seconds for SCSI devices to settle
Oct 15 17:41:07 dfly kernel: aacd0: <RAID 0 (Stripe)> on aac0
Oct 15 17:41:07 dfly kernel: aacd0: 17354MB (35542272 sectors)
Oct 15 17:41:07 dfly kernel: sa0 at ahc0 bus 0 target 6 lun 0
Oct 15 17:41:07 dfly kernel: sa0: <ARCHIVE Python 06408-XXX 8130> Removable
Sequential Access SCSI-3 device
Oct 15 17:41:07 dfly kernel: sa0: 20.000MB/s transfers (20.000MHz, offset 15)
Oct 15 17:41:07 dfly kernel: Mounting root from ufs:/dev/aacd0s2a
Oct 15 17:41:07 dfly kernel: cd0 at ahc0 bus 0 target 5 lun 0
Oct 15 17:41:07 dfly kernel: cd0: <NEC CD-ROM DRIVE:466 1.06> Removable CD-ROM
SCSI-2 device
Oct 15 17:41:07 dfly kernel: cd0: 20.000MB/s transfers (20.000MHz, offset 15)
Oct 15 17:41:07 dfly kernel: cd0: Attempt to query device size failed: NOT
READY, Medium not present

History

#1 Updated by justin over 7 years ago

On Sun, October 15, 2006 6:03 pm, Mike Jakubik wrote:

I see that missing device name too on a DragonFly desktop, which I think
is also a Dell ... 2400? I'm not near it to check. I have not tried a
SMP kernel in that machine, snce it's a single CPU system.

#2 Updated by jontro over 7 years ago

I get the device name missing too after panics / unsafe shutdowns
amr0: <LSILogic MegaRAID 1.51> mem 0xf6af0000-0xf6afffff irq 9 at
device 10.0 on pci2
amr0: <LSILogic MegaRAID SATA 150-4D> Firmware 713N, BIOS G119, 64MB RAM
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 715410MB (1465159680 sectors) RAID 5 (optimal)

/dev/amrd0s1a 248M 207M 21M 91% /
DragonFly 1.3.7-DEVELOPMENT #2: Sun Oct 30 17:50:46 GMT 2005

#3 Updated by wettoast over 7 years ago

Im not really concerned with the lack of the device name, but more with the SMP
issues. Is no one interested in debugging this?

#4 Updated by justin over 7 years ago

I don't have the skills to fix the issue, but I could see if I can
duplicate it on my Dell machine that has the matching lack of device name,
though it's a uniprocessor machine. Should a SMP kernel function on a
uniporicessor machine?

#5 Updated by wettoast over 7 years ago

The problem appears to be related to interrupt routing.

APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
SMP: AP CPU #1 Launched!
Warning: ACPI idle hook not yet supported for SMP

Following is a chat log of the advice i received on IRC.

[02:16pm] <wettoast> however, the server works fine in fbsd 6 and windows 2003
in smp
[02:16pm] <@TGEN> that's important
[02:16pm] <@TGEN> yes, the hardware in itself isn't really to blame
[02:16pm] <@TGEN> the code that configures the APICs probably is
[02:16pm] <wettoast> i see..
[02:17pm] <@TGEN> I'm not sure what to do about it though
[02:17pm] <@TGEN> interrupt routing is tedious business on x86
[02:17pm] <wettoast> maybe ill just make a note of that in the bug report
[02:17pm] <@TGEN> yeah, add those lines from dmesg and a short description
[02:18pm] <@TGEN> Matt might be able to figure something out, he's done this
before on DFly

#6 Updated by elekktretterr over 7 years ago

My server which is SMP, produces

APIC_IO: MP table broken: IRQ 15 not ISA when IRQ 14 is!
Programming 16 pins in IOAPIC #0
Programming 16 pins in IOAPIC #1
Warning: IOAPIC #1 pin 17 does not exist, cannot program!

in dmesg. I mean it seems to work well as it should, but im just asking
why this happens?

To Mike:
Ive seen that error before on one of the SMP boxes ive setup with
DragonFly, have you tried updating BIOS?

In my opinion as long as it says CPU #x launched that shouldnt be a
problem....

Petr

#7 Updated by dillon over 7 years ago

:My server which is SMP, produces
:
:APIC_IO: MP table broken: IRQ 15 not ISA when IRQ 14 is!
:Programming 16 pins in IOAPIC #0
:Programming 16 pins in IOAPIC #1
:Warning: IOAPIC #1 pin 17 does not exist, cannot program!
:
:in dmesg. I mean it seems to work well as it should, but im just asking
:why this happens?
:
:To Mike:
:Ive seen that error before on one of the SMP boxes ive setup with
:DragonFly, have you tried updating BIOS?
:
:In my opinion as long as it says CPU #x launched that shouldnt be a
:problem....
:
:Petr

I see this error on my test boxes. Insofar as I can tell, the BIOS
is generating a broken MP table.

I was able to work around the problem by commenting out APIC_IO. This
will cause the kernel to use the LAPIC for SMP related stuff and the
PIC (and not the APIC) for interrupt routing. It is a terrible hack,
but it often works.

options SMP # Symmetric MultiProcessor Kernel
#options APIC_IO # Symmetric (APIC) I/O

-Matt

#8 Updated by wettoast over 7 years ago

Disabling APIC_IO seems to make the system sane. Are there any negative effects
from disabling it?

P.S. Although this is an old beast, I do not think this is a hardware problem,
as it works just fine with Windows 2003 and FreeBSD 6 (with APIC).

#9 Updated by dillon over 7 years ago

:Mike Jakubik <> added the comment:
:
:Disabling APIC_IO seems to make the system sane. Are there any negative effects
:from disabling it?
:
:P.S. Although this is an old beast, I do not think this is a hardware problem,
:as it works just fine with Windows 2003 and FreeBSD 6 (with APIC).

Mostly things that go unnoticed. The PIC has fewer IRQs then the APICs,
so more devices will wind up sharing an IRQ, and all PIC interrupts
are routed to cpu #0 (but at the moment we do that anyway even for
APIC_IO so no loss there). Some PC hardware routes a catch-all IRQ
to the PIC as well, usually to IRQ 7 or IRQ 15, and on such hardware
you pretty much have to disable the PIC entirely to get rid of the
spurious interrupts. Eventually PC hardware will stop supporting the
old PIC entirely. Some already don't.

For DragonFly this means we will have to play catch-up with FreeBSD
with regards to moving away from using the 8254 timer (using the LAPIC
timer instead), which allows us to disable the PIC entirely, and then
use APIC_IO. Eventually.

At the moment there are many, MANY BIOSes whos MP tables are totally
broken when it comes to IO APIC pin assignments. Yours is one of
those BIOSes. Such systems often have a working PIC, though, so
commenting out APIC_IO usually does the trick. ACPI is supposed to
replace the MP table but ACPI has its own problems. There are *FOUR*
BIOS interrupt routing interfaces and all of them are broken to some
degree or other.

-Matt
Matthew Dillon
<>

#10 Updated by justin over 7 years ago

Removing APIC_IO is the fix for now.

Also available in: Atom PDF