Project

General

Profile

Actions

Bug #1433

closed

booting hangs at "Waiting for SCSI devices to settle"

Added by pavalos almost 15 years ago. Updated over 14 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

I'm having a really hard time booting one of my machines. It keeps
getting stuck at:

Waiting 2 seconds for SCSI devices to settle

I'm able to break to debugger. Here's what a ps looks like:

CPU0 stopping CPUs: 0x00000002
stopped
Stopped at siointr1+0xdc: movl $0,brk_state2.11395
db> ps
pid lwp uid ppid pgrp pflag lflag stat wmesg wchan cmd
1 d397d730 0 0 0 000200 000000 1 1 swapper
0 c03dfc20 0 0 0 000200 000000 2 3 conifhk c03c1220 swapper
oops, ran out of processes early!
cpu 0 tdrunqmask 00000000 curthread 0xff8003c0 reqflags 0000
INCOMING IPIQS:
tdq thread pid flags pri/cs/mp sp wmesg comm

tdq     thread pid    flags pri/cs/mp        sp    wmesg comm
32 0xff8003c0 -1 00200001 12/02/02 0xff807cc4 - idle_0
32 0xc03dfd40 0 00020080 12/02/01 0xc0573c98 conifhk swapper
32 0xc03ea200 -1 00010400 12/01/01 0xc3f2cd88 - ithread emerg
32 0xc03e0da4 -1 00210400 28/01/00 0xc3f2fd88 - ithread 0
32 0xc04654b4 -1 00010400 12/01/01 0xc3f32d88 - softclock 0
32 0xc03e5c44 -1 00210400 14/01/00 0xd73c9d88 - ithread 68
32 0xc045c0a0 -1 00000400 12/01/01 0xd73ccd88 - usched 0
32 0xc045d104 -1 00000400 12/01/01 0xd73d2d88 - dsched 0
32 0xda3ba500 1 00000600 10/01/01 0xd73d8d34 - swapper
32 0xc0469580 -1 02240400 12/02/00 0xd73dbcc8 waitport netisr_cpu 0
32 0xc0467c40 -1 02240400 12/02/00 0xda35ecc8 waitport ifnet 0
32 0xda3ba600 -1 00020680 12/03/01 0xda3c5ca4 tqthr taskqueue 0
32 0xc03e5d6c -1 00210400 14/01/00 0xda3cbd88 - ithread 69
32 0xda3ba800 -1 00020e80 12/02/01 0xda3cec9c ccb_scan xpt_thrd
32 0xc03e5b1c -1 00210400 14/01/00 0xda3d1d88 - ithread 67
32 0xda3ba900 -1 00040e00 12/02/01 0xda3d4ccc waitport acpi_task
32 0xc03e180c -1 00210400 28/01/00 0xda3dbd88 - ithread 9
32 0xc03e57a4 -1 00210400 14/02/01 0xda3decf0 - ithread 64
32 0xc03e1244 -1 00210400 28/01/00 0xda582d88 - ithread 4
32 0xc03e111c -1 00210400 28/01/00 0xda585d88 - ithread 3
32 0xc03e1494 -1 00210400 28/02/01 0xda588cf0 - ithread 6
32 0xc03e2274 -1 00210400 28/01/00 0xda67dd88 - ithread 18
32 0xda3bac00 -1 00020e80 12/02/01 0xda6a2c94 idle aic_recovery0
32 0xc03e239c -1 00210400 28/01/00 0xda6f7d88 - ithread 19
32 0xda3bad00 -1 00020e80 12/02/01 0xda6fac94 idle aic_recovery1
32 0xc03e2024 -1 00210400 28/01/00 0xda722d88 - ithread 16
32 0xc03e214c -1 00210400 28/01/00 0xda74ad88 - ithread 17
32 0xda3baf00 -1 00020680 12/02/01 0xda74dc8c rwait random
32 0xc048fa80 -1 02240400 12/02/00 0xda754cc8 waitport udp_thread 0
32 0xc048eac0 -1 02240400 12/02/00 0xda75acc0 waitport tcp_thread 0
32 0xda3bbb00 -1 00040600 12/02/01 0xde51accc waitport rtable_cpu 0
cpu 1 tdrunqmask 00000000 curthread 0xff8083c0 reqflags 0000
INCOMING IPIQS:
tdq thread pid flags pri/cs/mp sp wmesg comm
tdq     thread pid    flags pri/cs/mp        sp    wmesg comm
32 0xff8083c0 -1 00200001 12/00/00 0xff80fd3c - idle_1
32 0xc04655c4 -1 00010400 14/02/01 0xc3f35d24 - softclock 1
32 0xc045c1a4 -1 00000400 12/01/01 0xd73cfd88 - usched 1
32 0xc045d208 -1 00000400 12/01/01 0xd73d5d88 - dsched 1
32 0xc046967c -1 02240400 12/02/00 0xd73decc8 waitport netisr_cpu 1
32 0xc0467d3c -1 02240400 12/02/00 0xda3c2cc8 waitport ifnet 1
32 0xda3ba700 -1 00020680 12/03/01 0xda3c8ca4 tqthr taskqueue 1
32 0xc048fb7c -1 02240400 12/02/00 0xda757cc8 waitport udp_thread 1
32 0xc048ebbc -1 02240400 12/02/00 0xda75dcc0 waitport tcp_thread 1
32 0xda3bbc00 -1 00040600 12/02/01 0xde51dccc waitport rtable_cpu 1
CURCPU 0 CURTHREAD 0xff8003c0 (-1)

I have to keep rebooting the machine until it finally decides to work.
What else can I try?

--Peter

Actions #1

Updated by dillon almost 15 years ago

:I'm having a really hard time booting one of my machines. It keeps
:getting stuck at:
:
:Waiting 2 seconds for SCSI devices to settle
:
:I'm able to break to debugger. Here's what a ps looks like:

One of the CAM devices is not returning from its XPT_BUS_SCAN
action.
You have to find out which one.

: 32 0xda3ba800 -1 00020e80 12/02/01 0xda3cec9c ccb_scan xpt_thrd
:...
: 32 0xc03e2274 -1 00210400 28/01/00 0xda67dd88 - ithread 18
: 32 0xda3bac00 -1 00020e80 12/02/01 0xda6a2c94 idle aic_recovery0
: 32 0xc03e239c -1 00210400 28/01/00 0xda6f7d88 - ithread 19
: 32 0xda3bad00 -1 00020e80 12/02/01 0xda6fac94 idle aic_recovery1
: 32 0xc03e2024 -1 00210400 28/01/00 0xda722d88 - ithread 16
:...
:
:I have to keep rebooting the machine until it finally decides to work.
:What else can I try?
:..
:--Peter

We have to figure out which SCSI device is having the problem.  Any
AHCI or USB stuff plugged in? Or is the only CAM bus from the AIC
driver?
-Matt
Actions #2

Updated by dillon almost 15 years ago

Also, Peter, do a boot -v. If the boot fails or succeeds you should
see a different set of CAM debug messages on the console and hopefully
be able to track down which CAM device is having the problem and/or
where.

-Matt
Actions #3

Updated by pavalos almost 15 years ago

On Tue, Jul 21, 2009 at 09:43:14AM -0700, Matthew Dillon wrote:

We have to figure out which SCSI device is having the problem. Any
AHCI or USB stuff plugged in? Or is the only CAM bus from the AIC
driver?

No USB or AHCI.

  1. camcontrol devlist -v
    scbus0 on ahd0 bus 0:
    < > at scbus0 target -1 lun -1 ()
    scbus1 on ahd1 bus 0:
    <SEAGATE ST336607LC 0007> at scbus1 target 0 lun 0 (pass0,da0)
    <SEAGATE ST3146854LC D403> at scbus1 target 1 lun 0 (pass1,da1)
    < > at scbus1 target -1 lun -1 ()
    scbus-1 on xpt0 bus 0:
    < > at scbus-1 target -1 lun -1 (xpt0)

--Peter

Actions #4

Updated by dillon almost 15 years ago

:No USB or AHCI.
:...
:--Peter

I'm looking at the code trying to figure it out.
When it gets stuck try breaking into DDB and doing:
print *busses_to_config
print *busses_to_reset
If it can't find the symbol try compiling up a kernel with those
variables as globals instead of statics, in bus/cam/cam_xpt.c.
Maybe also try augmenting the xptconfigfunc() with kprintf's,
xpt_finishconfig(), and so forth.
It kinda looks to me like ahd might either be getting stuck on
a XPT action or may be losing track of a request and not ever calling
xpt_done() on it.
-Matt
Actions #5

Updated by pavalos over 14 years ago

Fixed in e037acafa7b95d0af56ea978bad8678bb08c6823.

Actions

Also available in: Atom PDF