Project

General

Profile

Bug #2994

Intermittent boot hangs after git: hammer - HAMMER Version 7

Added by davshao 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
03/28/2017
Due date:
% Done:

0%


Description

Bisected to git: hammer - HAMMER Version 7
commit 4c09d9c4fd910651904ede280ad90a4abf3fc5d7
Date: Fri Mar 17 14:06:24 2017 -0700

hammer - HAMMER Version 7

intermittent boot hangs stopping at

ums0: <Logitech USB-PS2 Optical Mouse, class 0/0, rev 2.00/20.00, addr 5> on usb
ums0: 3 buttons and [XYX] coordinates ID=0
no B_DEVMAGIC (bootdev=0)

Boot proceeds normally when using verbose option which is why I can attach
the verbose dmesg.

The machine uses a refurbished Asus P8H77-V motherboard with several
internal hard drives with different OSes using legacy BIOS.

$ hammer info
Volume identification
Label ROOT
No. Volumes 1
HAMMER Volumes /dev/serno/WD-WMAYP5624974.s3d
Root Volume /dev/serno/WD-WMAYP5624974.s3d
FSID 9d8141cd-c974-11e5-9ccd-bd5ff4499104
HAMMER Version 6
Big-block information
Total 29669
Used 9119 (30.74%)
Reserved 70 (0.24%)
Free 20480 (69.03%)
Space information
No. Inodes 856028
Total size 232G (248881610752 bytes)
Used 71G (30.74%)
Reserved 560M (0.24%)
Free 160G (69.03%)
PFS information
PFS# Mode Snaps
0 MASTER 0 (root PFS)

p8h77v_dmesg.txt View (63.3 KB) davshao, 03/28/2017 11:44 AM

History

#1 Updated by dillon 4 months ago

Did it get to the 'CAM: Configuring N busses' and 'CAM: finished configuring all busses' before hanging? If not then it will stop there waiting for CAM to finish configuring.

There is a possible deadlock somewhere in CAM from changes made in Feb to fix some other holes in CAM's locking. The fact that a verbose boot worked makes that all the more likely to be the issue. Kind of a chicken and egg issue. What I really need is a backtrace of the offending kernel threads but that might be difficult at that point in the boot. you might be able to ctl-alt-esc into the debugger, but it would take some messing around to get a decent trace of the offending threads (basically you have to find the stack addresses which is typically listed in the thread list and x/a <addr> then hit return a ton of times to dump the stack for each thread). I don't know if the symbol table will be available at that point in the boot sequence.

-Matt

#2 Updated by davshao 4 months ago

Apologies, it now seems to me that the problem for my machine may be
with:

commit f240042b9f9bcfdacd70499659fcc5f55d5e49ab
Date: Sun Mar 26 23:35:55 2017 -0700

kernel - Spread IPIs out to more priority levels

#3 Updated by dillon 4 months ago

Could you re-test with and without that commit? Be sure to do a full kernel rebuild for that one (world rebuild not necessary). If that is breaking you then I missed something in that commit and I will back it out.

-Matt

Also available in: Atom PDF