Bug #607
closedlenovo R60 SMP/APIC_IO woes
Added by andrew_atrens over 17 years ago. Updated almost 13 years ago.
0%
Description
Hey Folks,
Just got a IBM ThinkPad R60 ... it's got a "Centrino Duo" dual core processor, so I thought I'd
try an SMP kernel ..
. and interrupt delivery does not work for bge0, neither does sound, nor disk (when I'm in AHCI
mode, in bios compatibility mode (UDMA33) it does work).
Symptoms are - bge0 watchdog timeouts, with ethereal I see TX's on the wire - but can't receive
any packets. sound doesn't work pcm0:play:0:dsp0.0: play interrupt timeout, channel dead. for
the disk, controller is detected, but no disks found.
All in all, pretty broken. :(
Oh, EST is broken, too, but that's because the CPU runs at 1.66GHz and the cpu reports as -
hw.model: Genuine Intel(R) CPU T2300 @ 1.66GHz
I suspect this cpu is supported on NetBSD, I'll take a shot at merging the newer NetBSD version ...
Would dearly love to be able to turn up the second core on this puppy! :)
Cheers,
Andrew
Updated by andrew_atrens over 17 years ago
Should have mentioned - I'm running HEAD
Updated by c.turner over 17 years ago
Had similar problems with an acer F1000 with bge0
(which processor wise is a turion X2 proc)
did you try with SMP and without APIC_IO ?
that seemed to fix my bge lockups.
havent tried sound, or similarly have not been able to get the hdd
in non PATA compatibility. NATA didn't work for me but I've not yet
been able to issue a coherent investigation report (or try -HEAD)..
it might be worth a shot for you.
(for NATA, dig around on the wiki, there is something about what is needed)
not too worried on this because 'ACPI idle hook not enabled for SMP'..
my laptop is a work in progress.. but as long as theres power its ok :)
good luck
Updated by andrew_atrens over 17 years ago
It is - a T2300
Nope. Haven't tried that!
Hmm. I though APIC_IO was needed for SMP to work... I've never been 100% sure about how the two
are coupled though.
I'm using -HEAD and nata - works for me :)
Oh, I suppose. Is that what estd uses to decide when to slow stuff down ?
thanks for the tip! :)
Cheers,
Andrew
Updated by c.turner over 17 years ago
100% sure about how the two
are coupled though.
see the 'X2' related (AMDX2?) config in post 1.8.1ish /src/sys/config ..
not sure how that relates to bge, but in any case, perhaps some chipset
issues.
does that mean that it fixed the problem, or legacy mode is 'good
enough' for now?
am a humble user and erstwhile list-lurker - so can't comment on that,
but I didn't worry too much about power consumption after seeing it
in my dmesg and knowing my suspend doesn't work (again, need to send
reports) until I can dig further :) but again, as a lurker I am
'non-authoritative' as to the current state-of-the art here..
Updated by dillon over 17 years ago
:Hmm. I though APIC_IO was needed for SMP to work... I've
:never been 100% sure about how the two
:are coupled though.
Most SMP systems ought to work without APIC_IO. Basically there are
two types of APICs. Each cpu has a LAPIC, and the motherboard chipset
has one or more IOAPICs. SMP requires the per-cpu LAPIC to be present
but doesn't require the IOAPICs.
When you compile a SMP kernel without APIC_IO it will compile in LAPIC
support but not IOAPIC support, and depends on the legacy 8259 (called
the 'PIC') to route interrupts, which is sometimes more compatible
with the BIOS.
There are lots of potential gotchas but usually a SMP box will work
in one or the other modes (IOAPIC or legacy PIC). If all else fails
you can turn on emergency interrupt polling in /boot/loader.conf or
with a sysctl. With emergency interrupt polling turned on only the
timer and IDE disk IRQs need to be properly routed, and they
usually are.
kern.emergency_intr_enable (default 0 = off)
kern.emergency_intr_freq (default 10 = 10hz)
-Matt
Updated by andrew_atrens over 17 years ago
Nice to know! Thanks Matt!
In the last few hours I've noted about 24 'stray interrupt 7' messages while
running SMP (no IOAPIC) .. I'm guessing that these are incorrectly routed
interrupts?
Do these need to be set by the bootloader, I suppose so, eh ?
Thanks for all the info, Matt! As always you're a great help!
. any ideas about my other problem (previous message) wrt 1/2 speed kernel builds?
Cheers,
Andrew
Updated by dillon over 17 years ago
:In the last few hours I've noted about 24 'stray interrupt 7' messages while
:running SMP (no IOAPIC) .. I'm guessing that these are incorrectly routed
:interrupts?
No, its probably a summary interrupt from the IOAPIC. Because the BIOS
is stupid, hardware chipsets are often able to route ALL of the masked
IOAPIC interrupts to a single PIC interrupt. It's usually IRQ 7 or
IRQ 15. Since the method to disable this routing is not usually
documented, we have no way of turning it off short of hoping that the
BIOS knows how to do it when we use the IOAPIC. Some IOAPICs
automatically route this special interrupt if their pins are masked.
Unfortunately, pin masking is a normal part of our interrupt handling
code (linux does something nutty to try to avoid pin masking).
The only real solution is for us to not use IRQ 7 at all and just
mask it out on the PIC.
:> kern.emergency_intr_enable (default 0 = off)
:> kern.emergency_intr_freq (default 10 = 10hz)
:
:Do these need to be set by the bootloader, I suppose so, eh ?
It can be set by the bootloader and/or at run-time with a sysctl.
:Thanks for all the info, Matt! As always you're a great help!
:
:. any ideas about my other problem (previous message) wrt 1/2 speed kernel builds?
:
:Cheers,
:
:Andrew
All I can think of is that its having real problems routing interrupts.
Are you using emergency interrupt polling or does it work without that?
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by elekktretterr over 17 years ago
Hi Matt,
Reading this makes me wonder, what does FreeBSD do about this issue?
They do not have APIC_IO option (as far as i know they have just option
SMP) and it seems to be relatively trouble free in regards to
IOAPIC/LAPIC: On DragonFly it seems to be an issue more often.
Petr
Updated by andrew_atrens over 17 years ago
It works without emergency interrupt polling. My compile tests were run with SMP (no IO_APIC and
no emergency interrupt polling).
I'm just now re-running my kernel compile test with e-i-p enabled to see if it improves the speed.
Okay, that's better - compile time dropped from 13 minutes to just over 7 minutes! That's roughly
the same as my UP kernel compile time.
Now I'll try with -j 2 to see if that helps ..
Whoa! 4 minutes! Very cool!
Mucho better! :) :)
Now .. speculating as to the cause ... could it be that interrupts are only being delivered to
CPU0 and not CPU1 ? That would explain the roughly 1/2 speed slow down during the test if both
cpus were alternating running compile processes ..
Andrew
Updated by dillon over 17 years ago
:Whoa! 4 minutes! Very cool!
:
:Mucho better! :) :)
:
:Now .. speculating as to the cause ... could it be that interrupts are only being delivered to
:CPU0 and not CPU1 ? That would explain the roughly 1/2 speed slow down during the test if both
:cpus were alternating running compile processes ..
:
:Andrew
Nope. Well, yup... but that's normal. When the PIC is used interrupts
are usually only delivered to one cpu. But it wouldn't cause a
slow down. Performance would be about the same.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by dillon over 17 years ago
:Hi Matt,
:Reading this makes me wonder, what does FreeBSD do about this issue?
:They do not have APIC_IO option (as far as i know they have just option
:SMP) and it seems to be relatively trouble free in regards to
:IOAPIC/LAPIC: On DragonFly it seems to be an issue more often.
:
:Petr
They've moved the bar a bit more forward, but there are still problems.
The main difference between FreeBSD and us is that FreeBSD has done
a lot more work on supporting interrupt routing via ACPI.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by belczyk over 17 years ago
You might want to take a look at my attepmt to make EST work on CORE cpus --
it's at http://bsd.krakow.pl/df/est_core.tgz. It's still a work in progress,
full of debug, etc. but seems to work on my CORE2 (T5600). Unfortunately I
can't spend any time on it within next month because I've got to focus on my
thesis.
Updated by swildner almost 13 years ago
- Description updated (diff)
- Status changed from New to Closed
- Assignee deleted (
0)
Assume that this is fixed by now. If there are still issues, please reopen.