Bug #731: system freeze on "slice too large" - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #731

open

system freeze on "slice too large"

Added by corecode about 18 years ago. Updated about 3 years ago.

Status:

New

Priority:

Normal

Assignee:

tuxillo

Category:

Kernel

Target version:

6.4

Start date:

Due date:

% Done:

Estimated time:

Description

hey,

i've now had twice a nasty freeze (kind of) with something like this (hand transcribed):

dscheck(#ad/0x20021): slice too large 2/2
..

then vinum tells me that it put "build" down and continues:

fatal: build.p0.s0 read error, offset 33831591936 for 4096 bytes
build.p0.s0: user buffer offset 10209280000 for 4096 bytes

(more slice too large follow)

then, the namecache does

blocked on 0xd4fb7b58 "corecode"

and repeats it every 30 seconds or so. system is unoperable at this point.

breaking to the debugger works, but dumpsys does not work:

dumping to dev #ad/0x20023, blockno 2130432
dump failed, reason: area improper

i'm running:

DragonFly sweatshorts.home.corecode.ath.cx 1.9.0-DEVELOPMENT DragonFly 1.9.0-DEVELOPMENT #14: Sun Jun 17 11:03:58 CEST 2007     corecode@sweatshorts.home.corecode.ath.cx:/usr/build/obj/usr/build/src/sys/SWEATSHORTS  i386

i've attached relevant outputs.

thanks,
simon

Files

Download all files

disklabel-ad2s1.txt (527 Bytes) disklabel-ad2s1.txt		corecode, 07/15/2007 01:11 PM
disklabel-ad4s1.txt (577 Bytes) disklabel-ad4s1.txt		corecode, 07/15/2007 01:11 PM
fdisk-ad2.txt (741 Bytes) fdisk-ad2.txt		corecode, 07/15/2007 01:11 PM
fdisk-ad4.txt (735 Bytes) fdisk-ad4.txt		corecode, 07/15/2007 01:11 PM
vinum-l.txt (3 KB) vinum-l.txt		corecode, 07/15/2007 01:11 PM

Actions

Copy link

Updated by c.turner about 18 years ago

Simon 'corecode' Schubert wrote:

I'm definitely under the impression that vinum is unhappy with the new
labels.

2 things perhaps worth discussion:

- your 'a' partition overlaps the vinum - I assume this means
   you're using a vinum root?
 - was your volume set created prior to upgrading past the disklabel
   changes?
 - is this an occasional crash, or a constant one?

just chiming in to get some more data points out on this I suppose..
b/c this is most definably over my kernel-n00b head :)

test findings here from last night :

- able to create a new disk set (see the other thread)
 - unable to use the disk set after reboot without a vinum resetconfig,
   reload config files kind of thing (e.g. vinum 'start' messes up and
   needs to be reminded of the disk layout)
 - having 'vinum_enable="YES"' in rc.conf keeps me from booting up,
   as it appears to make the kernel very confused about it's various
   mount points (e.g. after vinum start, it 'forget's about existing
   'raw partition' mounts and can't find getty, etc...)

when I booted with no vinum_enable and 'vinum list' yields no drives,
and 'vinum start'ed causes the 'raw partition amneisia' to set in, I
get some of those dscheck messages like you saw. I then couldn't drop to
the debugger, and cntrl-alt-del resulted in some very interesting
infinite loop printouts ..

perhaps the mountpoint confusion explains why the dump didn't work for you?

unfortunately, I don't have the problem machine hooked up to a
serial console yet to snag the output, and due to other events I'm going
to be unable to test for the next few weeks..

Actions

Copy link

Updated by dillon about 18 years ago

I'm getting close to just throwing vinum away. It makes so many bad
assumptions about EVERYTHING I'm getting seriously pissed at it.

vinum's automatic disk search is basically broken.  It is making
    ridiculous assumptions about disk layout instead of doing what it
    SHOULD be doing, which is to scan available media for partitions
    marked as being type 'vinum'.

Try specifying an actual partition being used as a vinum drive and
    don't use the auto-start feature.  When I do that, e.g. when I
    do 'vinum read /dev/vn0s0d' on my test disk, it finds the saved
    configuration.

When I try to do 'vinum start' vinum blows up my system.  Literally
    blows it up.  It is clearly leaving memory structures associated
    with the kernel that it has freed.  vinum is such a huge mess I just
    don't have any idea where the problem is occuring, but I think it
    may be related to how vinum tries to scan available disk media.

So far the only 'slice too large' errors I've gotten occur when I
    try to do 'vinum start'.  I think it is related to vinum's insistance
    on rolling its own everything.  It constructs device numbers right out
    of the blue and then tries to use them.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by dillon about 18 years ago

:hey,
:
:i've now had twice a nasty freeze (kind of) with something like this (hand transcribed):
:
:dscheck(#ad/0x20021): slice too large 2/2
:..
:
:then vinum tells me that it put "build" down and continues:
:...

Same thing.  Don't use vinum's autostart feature.  Try specifying an
    actual partition being used as a vinum drive.

I don't know regarding the read errors.  That sounds like a different
    issue.

-Matt

Actions

Copy link

Updated by corecode about 18 years ago

This is not a new kernel; vinum worked for at least 3 years here. The message do not appear on bootup: I am using root on vinum, so I have to specify the drives it should use.

Wouldn't dscheck return an error which makes a physical read to fail? That's probable why vinum gets a read error and shuts down the system.

cheers
simon

Actions

Copy link

Updated by corecode about 18 years ago

This is an old kernel, no new labels here.

yes.

it was created about 3 years ago, I think.

it happened two times, within 12 hours or so.

cheers
simon

Actions

Copy link

Updated by dillon about 18 years ago

:This is an old kernel, no new labels here.
:
:...
:yes.
:
:> - was your volume set created prior to upgrading past the disklabel
:> changes?
:
:it was created about 3 years ago, I think.
:
:> - is this an occasional crash, or a constant one?
:
:it happened two times, within 12 hours or so.
:
:cheers
: simon

I think there are two different issues here.  This issue:

:i've now had twice a nasty freeze (kind of) with something like this (hand trans
:cribed):
:
:dscheck(#ad/0x20021): slice too large 2/2
:..

Is simply vinum trying to access e.g. ad0s2 when ad0 is dangerously
    dedicated (i.e. only ad0s0 exists).  When does it do this?  This only
    happens to me when I try to have vinum auto-start.

:then vinum tells me that it put "build" down and continues:
:
:fatal: build.p0.s0 read error, offset 33831591936 for 4096 bytes
:build.p0.s0: user buffer offset 10209280000 for 4096 bytes
:
:(more slice too large follow)
:
:then, the namecache does
:
:blocked on 0xd4fb7b58 "corecode"

Read error sounds like a real read error.  Is that a valid offset?
    It's a byte offset, not a sector number.  Is there any disk redundancy?

I'm not sure what the two offsets are.  They must be relative to
    different things.  The first is probably relative to the vinum volume
    while the scond looks like it may be partition-relative.

A real read error can cause UFS to get confused and create a namecache
    blockage.

-Matt

Actions

Copy link

Updated by c.turner about 18 years ago

Matthew Dillon wrote:

I didn't see that reaction coming :)

Confirming this works for a striped (raid0) setup using:

'vinum read /dev/ad1 /dev/ad3'

Using particular slices / partitions did not work.
This is the same configuration as before, with ad¹³s1a as a 'vinum'
partition across the whole disk, I tried '/dev/ad¹³s1' and
'/dev/ad¹³s1a'.

I'd initally just read one disk in at a time, leaving the stripe in
'flaky' state.

This persisted across reboots until I did a

'vinum setdaemon 1; vinum setstate up ...; vinum saveconfig'

to get things back, which held held across reboots as well,
so saving the configuration, etc back to the disk seems ok.
as I mentioned my schedule will prevent testing for a while -
my apologies. Current state 'works for me' for the time being
since I don't have a vinum root machine.

I've not gotten a chance to read the code yet.. do you see any other
problems with it, aside from the disk assumptions & general 'bit rot' /
need for TLC?

For some reason I like it,
although perhaps I just like saying the word 'vinum' .. hmm.

Actions

Copy link

Updated by c.turner about 18 years ago

Did some more investigating and it looks like the 'vinum start'
essentially calls 'vinum read' after a devstat(3) iteration to find
'suitable devices', which based on a quick test program appear to be the
actual disk devices (e.g. adN / sdN, etc). The difference seems to be
that 'vinum read' with no arguments seems to probe things differently
than 'vinum read' without arguments .. so I'll keep digging.. Perhaps a
short term fix related to this might 'hold things' for the time being
until the 'direct partition' scan you mention can be implemented.

also, there is some kgetenv related to vinum.disks checked during the
load, so perhaps what you'd mentioned in the other thread about adding
this for boot-time isn't needed .. ? Haven't tested yet / not sure if
this was from your earlier patch or not - I still need to dig further..
will be attempting to do so over the next few days, hopefully.

Actions

Copy link

Updated by alexh almost 16 years ago

There have been a lot of changes relating to vinum and disk system (including
dscheck). This probably isn't relevant anymore, but I'd appreciate someone
confirming it.

Cheers,
Alex Hornung

Actions

Copy link

#10

Updated by tuxillo over 11 years ago

Description updated (diff)
Category set to Kernel
Assignee changed from 0 to tuxillo
Target version set to 3.8

Grab.

Actions

Copy link

#11

Updated by corecode over 11 years ago

File signature.asc added

Is vinum still supported?

On 02/20/2014 04:46 PM, bugtracker-admin@leaf.dragonflybsd.org wrote:

Issue #731 has been updated by tuxillo.

Description updated
Category set to Kernel
Assignee changed from 0 to tuxillo
Target version set to 3.8.0

Grab.

----------------------------------------
Bug #731: system freeze on "slice too large"
http://bugs.dragonflybsd.org/issues/731#change-11804

Author: corecode

Status: New

Priority: Normal

Assignee: tuxillo

Category: Kernel

Target version: 3.8.0
----------------------------------------
hey,

i've now had twice a nasty freeze (kind of) with something like this (hand transcribed):

dscheck(#ad/0x20021): slice too large 2/2
..

then vinum tells me that it put "build" down and continues:

fatal: build.p0.s0 read error, offset 33831591936 for 4096 bytes
build.p0.s0: user buffer offset 10209280000 for 4096 bytes

(more slice too large follow)

then, the namecache does

blocked on 0xd4fb7b58 "corecode"

and repeats it every 30 seconds or so. system is unoperable at this point.

breaking to the debugger works, but dumpsys does not work:

dumping to dev #ad/0x20023, blockno 2130432
dump failed, reason: area improper

i'm running:
DragonFly sweatshorts.home.corecode.ath.cx 1.9.0-DEVELOPMENT DragonFly 1.9.0-DEVELOPMENT #14: Sun Jun 17 11:03:58 CEST 2007 corecode@sweatshorts.home.corecode.ath.cx:/usr/build/obj/usr/build/src/sys/SWEATSHORTS i386

i've attached relevant outputs.

thanks,
simon

---Files--------------------------------
disklabel-ad2s1.txt (527 Bytes)
disklabel-ad4s1.txt (577 Bytes)
fdisk-ad2.txt (741 Bytes)
fdisk-ad4.txt (735 Bytes)
vinum-l.txt (3 KB)