Bug #1463
openMountroot before drives are initialized
0%
Description
Hi,
As per Sachas comment about splitting my previous bug report into separate
ones. here is an issue that has been bugging me on this box. (HP Proliant
DL180 G5)
It seems that mountroot is ran before da0 (and cd0) gets initialized on
this box. mountroot then complains that da0s1 is missing.
Here is a verbose dmesg:
http://leaf.dragonflybsd.org/mailarchive/bugs/2009-08/msg00128.html.
Note that this problem seems to happen randomly when booting from live cd.
But happens all the time when booting from disk.
Petr
Files
Updated by eric.j.christeson about 15 years ago
I am also running into this problem on a Dell Optiplex GX270 (P4 2.26Ghz) I
have a SCSI drive as my boot/root drive and IDE drive and CD.
I've been tracking HEAD and first noticed the problems after the devfs changes.
At the time I was going a few days between rebuilds so I can't easily pinpoint
the time.
I noticed a few interesting things:
1. booting in verbose mode does NOT result in a mountroot failure
2. at the mountroot prompt, ? doesn't list da0 (root device) the first time, but
will list it subsequent times.
3. at mountroot, specifying root doesn't work as the first typing. If I type ?
first, or try specifying root twice, it works.
4. Booting with or without a CD in the CD-ROM drive gives the same results
I've got a couple of hours, so I may try to look at this.
Included files:
dmesg.fine.out Verbose boot, no mountroot hang
dmesg.hang Standard boot, note failure after first time specifying rootdev,
strange cd0: message after ? and finding root after specifying rootdev again.
Updated by eric.j.christeson about 15 years ago
Guess you can't submit more than one file at a time.
Updated by dillon about 15 years ago
Do a verbose boot so we can see when CAM starts its probes.
It is probably the SYM driver rejecting the initial bus scan
from CAM, and then later (after it is too late) notifying CAM that
a new bus and/or devices are present asynchronously.
I'm not sure how easy it will be to fix, the SYM driver is 10,000
lines and it will take a few hours to figure out how it deals
with the SCSI bus scan.
-Matt
Updated by eric.j.christeson about 15 years ago
Verbose boot doesn't fail
Likewise if I set vfs.devfs.debug=3 it doesn't fail
setting vfs.devfs.debug=2 fails
dmesg.devfs.2 is a dump with vfs.devfs.debug=2
Updated by eric.j.christeson about 15 years ago
Don't know if this info helps, but setting SCSI_DELAY to 10000 or 20000 had no
effect.
Updated by elekktretterr about 15 years ago
Do a verbose boot so we can see when CAM starts its probes.
It is probably the SYM driver rejecting the initial bus scan
from CAM, and then later (after it is too late) notifying CAM that
a new bus and/or devices are present asynchronously.I'm not sure how easy it will be to fix, the SYM driver is 10,000
lines and it will take a few hours to figure out how it deals
with the SCSI bus scan.
Hi Matt,
I attached my verbose dmesg to the original email. Do you think our
problems are same? This server uses the ciss driver.
Petr
Updated by elekktretterr about 15 years ago
Also, is there a way maybe to teach mountroot to wait untill all drives
are initialized?
Petr
Updated by elekktretterr about 15 years ago
Eric J. Christeson <eric.j.christeson@gmail.com> added the comment:
Don't know if this info helps, but setting SCSI_DELAY to 10000 or 20000
had no
effect.
Hi Eric,
I should point out that this problem was already happening before we put
the ciss raid card in it. It was happening with a USB attached cd drive
too. I saw it being initialized AFTER mountroot was ran.
Petr
Updated by eric.j.christeson about 15 years ago
I've been booting with various levels of CAMDEBUG (and it boots fine since the
output gives enough delay for init) and something occurred to me. Do you also
have (n)atapicam compiled in? I noticed that sometimes I would see messages
like this:
WARNING waiting for the following device to finish configuring:
xpt: func=0xc0144b4f arg=0
With CAMDEBUG I see why, xpt has to enumerate all the scsci, ata, and usb
bus/devices in the system. xpt has to init before any of the scsi devs (or it
does, even if it doesn't have to) so I wonder if some of the delay isn't
there. I'm going to take out natapicam and see if things improve or not.
Updated by elekktretterr about 15 years ago
I havent tried it without natapicam. Im going to have to install FBSD 7 on
it for now as we are rushing to put this box in the datacentre. I realized
I cant put DragonFly on it because its going to run pgpool-II. The other
boxes all run 64bit OS, but all DragonFly builds are currently still only
32 bit and i was told that pgpool recovery(uses postgres PITR) is
architecture specific.
Petr
Updated by alexh about 15 years ago
Can you please try to boot again with commit
8c05caabb07caf24fd0dfab4f1497fb58a8c31e0 and writing "set kern.disk_debug=1"
at bootloader prompt?
It should give some more insight on the source of the problem. If you feel
like it, you can also set it to 2, so it gives some info on partition probing
for each slice.
Cheers,
Alex Hornung
Updated by hasso about 15 years ago
I have the same issue and also using ciss(4) (HP Proliant DL360 G6). Vanilla
kernel fails 100% here to mount root from harddisk, but with kern.disk_debug=1
it succeeds.
Updated by dillon about 15 years ago
:Hasso Tepper <hasso@estpak.ee> added the comment:
:
:I have the same issue and also using ciss(4) (HP Proliant DL360 G6). Vanill=
:a=20
:kernel fails 100% here to mount root from harddisk, but with kern.disk_debu=
:g=3D1=20
:it succeeds.
If Alex doesn't come up with something in the next week or so we will
add a straight-up delay before mountroot.
In fact, could you test that a straight out delay before mountroot works?
Here's a patch.
-Matt
Matthew Dillon
<dillon@backplane.com>
diff --git a/sys/kern/vfs_conf.c b/sys/kern/vfs_conf.c
index a159afc..8bdea67 100644
--- a/sys/kern/vfs_conf.c
+++ b/sys/kern/vfs_conf.c@ -109,8 +109,9
@ SYSINIT;
static void
vfs_mountroot(void *junk)
{
- int i;
cdev_t save_rootdev = rootdev;
+ int i;
+ int dummy;
/*
* Make sure all disk devices created so far have also been probed,
@ -121,6 +122,8
@ vfs_mountroot(void *junk)
* coverage.
*/
sync_devs();
+ tsleep(&dummy, 0, "syncer", hz*2);
+
/*
* The root filesystem information is compiled in, and we are
Updated by dillon about 15 years ago
(Oh, when testing the tsleep delay for the mountroot fix, do
it with kern.disk_debug=0 of course).
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by alexh about 15 years ago
I don't know how to approach this. The solution lies within cam and scsi_da, I
think. The disk is created on time (disk_create) but setdiskinfo does NOT
occur on time to trigger probing before mountroot.
I'll continue investigating, but I'd welcome any ideas on how to solve this in
the aforementioned direction.
Cheers,
Alex Hornung
Updated by eric.j.christeson about 15 years ago
This is no longer an issue for me as of DragonFly v2.3.2.780.g8f3b7d-DEVELOPMENT
which I built on Friday. Boots fine, for now. It will probably crop up again.
I would be happy to test patches.
Updated by alexh about 15 years ago
This is temporarily fixed in a2b579620dd7947b8f90d1311c7e87a57ec9f1ea. Leaving
open because the problem itself wasn't addressed.
Cheers,
Alex Hornung
Updated by alexh almost 14 years ago
Are there any suggestions on how to deal with it? The commit is just a hack
that'll delay the boot up alltogether. Is there any way to know if something is
still being probed, or pending probing?
Regards,
Alex