Bug #2994
openIntermittent boot hangs after git: hammer - HAMMER Version 7
0%
Description
Bisected to git: hammer - HAMMER Version 7
commit 4c09d9c4fd910651904ede280ad90a4abf3fc5d7
Date: Fri Mar 17 14:06:24 2017 -0700
hammer - HAMMER Version 7
intermittent boot hangs stopping at
ums0: <Logitech USB-PS2 Optical Mouse, class 0/0, rev 2.00/20.00, addr 5> on usb
ums0: 3 buttons and [XYX] coordinates ID=0
no B_DEVMAGIC (bootdev=0)
Boot proceeds normally when using verbose option which is why I can attach
the verbose dmesg.
The machine uses a refurbished Asus P8H77-V motherboard with several
internal hard drives with different OSes using legacy BIOS.
$ hammer info
Volume identification
Label ROOT
No. Volumes 1
HAMMER Volumes /dev/serno/WD-WMAYP5624974.s3d
Root Volume /dev/serno/WD-WMAYP5624974.s3d
FSID 9d8141cd-c974-11e5-9ccd-bd5ff4499104
HAMMER Version 6
Big-block information
Total 29669
Used 9119 (30.74%)
Reserved 70 (0.24%)
Free 20480 (69.03%)
Space information
No. Inodes 856028
Total size 232G (248881610752 bytes)
Used 71G (30.74%)
Reserved 560M (0.24%)
Free 160G (69.03%)
PFS information
PFS# Mode Snaps
0 MASTER 0 (root PFS)
Files
Updated by dillon over 7 years ago
Did it get to the 'CAM: Configuring N busses' and 'CAM: finished configuring all busses' before hanging? If not then it will stop there waiting for CAM to finish configuring.
There is a possible deadlock somewhere in CAM from changes made in Feb to fix some other holes in CAM's locking. The fact that a verbose boot worked makes that all the more likely to be the issue. Kind of a chicken and egg issue. What I really need is a backtrace of the offending kernel threads but that might be difficult at that point in the boot. you might be able to ctl-alt-esc into the debugger, but it would take some messing around to get a decent trace of the offending threads (basically you have to find the stack addresses which is typically listed in the thread list and x/a <addr> then hit return a ton of times to dump the stack for each thread). I don't know if the symbol table will be available at that point in the boot sequence.
-Matt
Updated by davshao over 7 years ago
Apologies, it now seems to me that the problem for my machine may be
with:
commit f240042b9f9bcfdacd70499659fcc5f55d5e49ab
Date: Sun Mar 26 23:35:55 2017 -0700
kernel - Spread IPIs out to more priority levels
Updated by dillon over 7 years ago
Could you re-test with and without that commit? Be sure to do a full kernel rebuild for that one (world rebuild not necessary). If that is breaking you then I missed something in that commit and I will back it out.
-Matt