Bug #934

stacked vn(4) borkitude

Added by c.turner over 6 years ago. Updated over 4 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

To make things more flexible, I've started using one largish partition
and creating vn disks for various uses underneath them.

last night I started to work on updating my vnconfig patch using this
new scheme and got a corrupted filesystem as follows:

- vnconfig -c -s labels vn10 /path/to/home.img
- mount /dev/vn10s0a /home
- cd /home/niftyscriptness
- do some stuff which generates a disk image for vkernels
dd, vnconfig, disklabel, newfs, mount, make installworld, etc.
which mounts /dev/vn0s0a underneath /home
- strangeness occurs

basically, it seems like the 1.10.1 VFS/vn is getting confused when a VN
is stacked on top of another vn.

First time, I did this procedure and the first 'mount' resulted in an
error (input/output error). Thinking I might have accidentally done
someting wrong with my vn allocation, I started over, and then
started to get wierd things in the working directory (layer 1 vn, holds
the mountpoint for layer 2) - the files the vkernel image builder uses
to keep track of things (.formatted, etc) were showing up in 'ls', but
ls -l would say 'no such file or directory'. Thinking a bug was upon me,
I rebooted, and when I tried to fsck the 'layer 1' /home VN, it reported
many errors - 'fsck -y' essentially trashed the filesystem.

I started to repeat a second round of tests today after restoring /home

first time 'worked' - e.g. the initial mount of /dev/vn0s0a into
/dev/vn10s0a's /home filesystem was ok, but the make installworld of
the Vkernel system paniced the system mid-way (sorry for copied trace -
still need to get my debug infrastructure up to date)

panic
ffs_valloc
ufs_makeinode
ufs_create
ufs_vnoperate
vop_old_create
vop_compat_ncreate ? (cant read my writing :)
vop_default
vfs_vnoperate
vop_ncreate
vn_open
kern_open
sys_open
syscall2
Xint80_syscall

when I rebooted, the /home filesystem was ok, so I started the process
again, and got the same kind of corruption as before -

first try, things seemed ok, so I interrupted, unmounted, vnconfig
-u'ed, etc & tried again -

on this try the first mount of the VN (vn0s0a) failed (input/output
error), with a simultaneous console message :

dscheck(#vn/80): attempt to access nonexistent partition

and possibly (saw this at some point):

vn0: reading primary partition table error accessing offset 00000000 for 2

at this point, or shortly thereafter, doing an 'ls' within the layer 1
/home filesystem came back blank, and 'cd .. ; ls -al' started yielding
the 'no such file or directory' strangeness.

I rebooted, and the /home filesytem fsck'ed clean, but mounted empty -
df showed it as being 96% full, however (4G filesystem)

While typing this, I did realize that the script to create the 'layer 2'
vn's was not leaving any label space in the disklabel - that being said
I don't think that should cause corruption on the 'host' /home
filesytem in any case.

Script was used many times before on a UP 'raw partition' /home -
just switched to a 1.10.1 SMP vn(4) /home - the new machine seems
otherwise stable.

one other note: /home was NFS exported but only mounted during the
initial crash

pointers (or perhaps fixed pointers :) on the next steps welcome..

Thanks in advance,

- Chris

History

#1 Updated by alexh almost 5 years ago

Our vn(4) has undergone some serious modifications, integrating it into the disk
layer. I don't think what is described here can still happen, but it would be
good if someone could confirm that this now works as expected.

Cheers,
Alex Hornung

#2 Updated by alexh over 4 years ago

the disk subsystem makes it possible to stack vn devices without problems.

Also available in: Atom PDF