Bug #2652
open189a0ff3761b47 ... ix: Implement MSI-X support locks up Lenovo S10 Intel Atom n270
0%
Description
For a i386 Lenovo S10 Intel Atom n270 netbook, bisection indicates using
189a0ff3761b47 ... ix: Implement MSI-X support and enable multiple TX rings
locks up the machine on booting at the point:
...
md0: invalid primary partition table: no magic
Math emulator present
hptrr: RocketRAID 17xx/2xxx SATA controller driver v1.2
hpt27xx: RocketRAID 27xx controller driver v1.0 (Feb 28 2014 21:38:19)
Attached is a full verbose dmesg from the same machine running with master
previous to the above commit. The machine only fully boots with
acpi disabled using hint.acpi.0.disabled=1, but even with acpi enabled the
lockup with the problematic commit occurs sooner than the normal lockup
with acpi enabled. "Normally" on this machine booting with acpi halts at
acpi0.nexus0.root0
acpi0: <LENOVO CB-01> [tentative] on motherboard
ACPI: All ACPI Tables successfully acquired
ACPI FADT: SCI testing interrupt mode ...
ACPI FADT: SCI testing level/high
IOAPIC: irq 9, gsi 9 edge/high -> level/high
Files
Updated by swildner over 10 years ago
On Mon, 03 Mar 2014 22:20:17 +0100,
<bugtracker-admin@leaf.dragonflybsd.org> wrote:
Issue #2652 has been reported by davshao.
----------------------------------------
Bug #2652: 189a0ff3761b47 ... ix: Implement MSI-X support locks up
Lenovo S10 Intel Atom n270
http://bugs.dragonflybsd.org/issues/2652
- Author: davshao
- Status: New
- Priority: Normal
- Assignee:
- Category:
- Target version:
----------------------------------------
For a i386 Lenovo S10 Intel Atom n270 netbook, bisection indicates using
189a0ff3761b47 ... ix: Implement MSI-X support and enable multiple TX
rings
locks up the machine on booting at the point:...
md0: invalid primary partition table: no magic
Math emulator present
hptrr: RocketRAID 17xx/2xxx SATA controller driver v1.2
hpt27xx: RocketRAID 27xx controller driver v1.0 (Feb 28 2014 21:38:19)
I don't think the ix commit is related to it. This looks like the old i386
bug we have that it randomly fails booting sometimes. Here it happens
occasionally (more often after changes to /boot/loader.conf, it seems) and
after a few retries (reset button) it starts booting again.
What it is, I don't know, but I'm relatively sure that it is not related
to either the ix commit (which at this early point of the boot will not
affect anything) or the hpt27xx driver.
Sascha
Updated by davshao over 10 years ago
I built the kernel multiple times git reset --hard-ing commits one-by-one and the problems began exactly with the mentioned commit.
Furthermore I would not make any assumption the hpt27xx driver can't be partially responsible as does not the hpt_init() call in sys/dev/raid/hpt27xx/osm_bsd.c call an init_config() function that is in turn I believed defined as hpt27xx_init_config in hpt27xx_config.h? But hpt27xx_init_config() is not defined in the source -- is it a function called from the HighPoint binary blob?
Updated by swildner over 10 years ago
On Thu, 06 Mar 2014 08:38:14 +0100,
<bugtracker-admin@leaf.dragonflybsd.org> wrote:
Issue #2652 has been updated by davshao.
I built the kernel multiple times git reset --hard-ing commits
one-by-one and the problems began exactly with the mentioned commit.Furthermore I would not make any assumption the hpt27xx driver can't be
partially responsible as does not the hpt_init() call in
sys/dev/raid/hpt27xx/osm_bsd.c call an init_config() function that is in
turn I believed defined as hpt27xx_init_config in hpt27xx_config.h? But
hpt27xx_init_config() is not defined in the source -- is it a function
called from the HighPoint binary blob?
Yes, but I've had the exact same issue (random hangs at boot) independent
of whether the hpt* drivers were compiled in or not. It would just hang at
the "md0: ..." message then.
But a check would be easy. Just move the os_printk() in hpt_init() below
init_config(), or add another message after hpt_init(), and look (in the
case when it hangs) if the message wasn't printed.
Sascha
Updated by davshao over 10 years ago
Here's what I have determined using kprintf's.
The hang occurs in xpt_init() which is called by cam_module_event_handler() on MOD_LOAD in sys/bus/cam/cam_xpt.c.
The hang occurs at the call to register_swi() call in xpt_init().
The hang occurs at the call to int_moveto_destcpu() in register_int() in sys/kern/kern_intr.c.
The Intel Atom N270 has hyper-threading and thus has ncpus 2. Perhaps it is somewhat an unfortunate accident
that SWI_CAMBIO 195, FIRTS_SOFTINT + 3 forcing the moveto.
The hang occurs at the call to lwkt_migratecpu() in int_moveto_destcpu() in sys/kern/kern_intr.c.
The hang occurs at the call to lwkt_setcpu_self() in lwkt_migratecpu() in sys/kern/lwkt_thread.c.
(td->td_gd != rgd) 1
(td->td_release != 0) 0
td_td_flags == 3
The hang occurs at the call to lwkt_switch() in lwkt_setcpu_self() in sys/kern/lwkt_thread.c.
Updated by swildner over 10 years ago
Can you post a diff with your kprintf()s somewhere because I don't quite understand yet why it would initialize CAM at this point.
Updated by davshao over 10 years ago
I got the idea for examining cam_xpt.c because I saw in
sys/dev/raid/hpt27xx/osm_bsd.c the line
SYSINIT;
Then doing a search for all other uses of SI_SUB_CONFIGURE, I saw in
sys/bus/cam/cam_xpt.c
DECLARE_MODULE(cam, cam_moduledata, SI_SUB_CONFIGURE, SI_ORDER_SECOND);
and observed the manpage for DECLARE_MODULE said that SYSINIT was called.
By the way, FreeBSD 10.0 at least has a
TUNABLE_INT("kern.cam.boot_delay", &xsoftc.boot_delay);
SYSCTL_INT(_kern_cam, OID_AUTO, boot_delay, CTLFLAG_RDTUN,
&xsoftc.boot_delay, 0, "Bus registration wait time");
in its implementation of cam_xpt.c
Updated by davshao over 10 years ago
- File xpt_init_tsleep.diff xpt_init_tsleep.diff added
The following patch to sys/bus/cam/cam_xpt.c that adds a tsleep before the call to register_swi() enables the Lenovo S10 (i386 Intel Atom N270) with
set hint.acpi.0.disabled =1
set kern.cam.init_delay=3000
to boot successfully again. Considering other problems such as
http://bugs.dragonflybsd.org/issues/2653
this won't be the last tsleep that needs to be added for this particular machine.
Updated by swildner over 10 years ago
Can you check if the two commits by Matt from today, 7da5c4fd138dc6e5035f5d7aff78a14b09b296e8 and mainly 902419bf6d9fd0f80afc9d07cd4b3e99d20f23ca change anything about this issue?
Thanks,
Sascha
Updated by swildner over 10 years ago
Is this issue still happening to you? If not, can you close this ticket please?