Bug #1218

panic: assertion: error == 0 in hammer_start_transaction

Added by rumcic over 5 years ago. Updated about 4 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

The machine was mostly idle and the panic happened during the night when I was
sleeping, so I don't remember much what I was running.
The only thing I do remember, was a vkernel running under gdb (still have to
find out why when running a diskless vkernel outside gdb it displays quite a
few "RPC timeout for server 192.168.0.16" before the network starts working
and goes on booting, but if I run the vkernel under gdb, I only get perhaps 2
of those messages and after that nothing happens - gets stuck) which was
semi-diskless (root on nfs, and one hammer fs partition on the first vkd, but
since it was stuck at the RPC timeout messages it shouldn't have gotten far
enough to mount the root, let alone the local hammer partition - unless it
started booting sometime during the night).

The backtrace:
panic: assertion: error == 0 in hammer_start_transaction
mp_lock = 00000000; cpuid = 0
Trace beginning at frame 0xe28c9968
panic(e28c998c,c02c6806,e28c9a84,c39a6738,e28c99a8) at panic+0x14d
panic(c03d0698,c03de4df,c03bbd41,6,45d61) at panic+0x14d
hammer_start_transaction(e28c9a84,debc0000,c39a6738,1,1) at
hammer_start_transaction+0x41
hammer_ioctl(de0bc550,c02c6806,e28c9c1c,1,c39a6738) at hammer_ioctl+0x2d
hammer_vop_ioctl(e28c9ae0,c04314e0,d272ad10,e27c46e8,0) at
hammer_vop_ioctl+0x2f
vop_ioctl(d272ad10,e27c46e8,c02c6806,e28c9c1c,1) at vop_ioctl+0x38
vn_ioctl(d61e90c0,c02c6806,e28c9c1c,c39a6738,d61e90c0) at vn_ioctl+0xbf
mapped_ioctl(4,c02c6806,bfbff8e0,0,e28c9d34) at mapped_ioctl+0x3e1
sys_ioctl(e28c9cf0,6,1e82,0,d8f675d8) at sys_ioctl+0x16
syscall2(e28c9d40) at syscall2+0x265
Xint0x80_syscall() at Xint0x80_syscall+0x36
boot() called on cpu#0

The dump is located at leaf:~rumko/crash/{kernel,vmcore}.0

The kernel was compiled on the 2nd January around noon CET ... so the sources
should have been from around then as well.
--
Regards,
Rumko

History

#1 Updated by dillon over 5 years ago

:...
:few "RPC timeout for server 192.168.0.16" before the network starts working
:and goes on booting, but if I run the vkernel under gdb, I only get perhaps 2
:of those messages and after that nothing happens - gets stuck) which was
:semi-diskless (root on nfs, and one hammer fs partition on the first vkd, but
:since it was stuck at the RPC timeout messages it shouldn't have gotten far
:enough to mount the root, let alone the local hammer partition - unless it
:started booting sometime during the night).
:
:The backtrace:
:panic: assertion: error == 0 in hammer_start_transaction
:mp_lock = 00000000; cpuid = 0
:Trace beginning at frame 0xe28c9968
:panic(e28c998c,c02c6806,e28c9a84,c39a6738,e28c99a8) at panic+0x14d
:...
:The dump is located at leaf:~rumko/crash/{kernel,vmcore}.0
:
:The kernel was compiled on the 2nd January around noon CET ... so the sources
:should have been from around then as well.
:--
:Regards,
:Rumko

Looking at the core the error code was 6, ENXIO, which implies
the underlying block device to the HAMMER filesystem went away.

It looks like a HAMMER mount on /mnt, backed by a VN device
(/dev/vn0s1a):

f_mntonname = "/mnt", '\0' <repeats 75 times>,
f_mntfromname = "VROOT", '\0' <repeats 74 times>,

vol_name = 0xe1c84080 "/dev/vn0s1a",

My guess is that your VN device is backed by a file over NFS
and NFS errored out.

-Matt
Matthew Dillon
<>

#2 Updated by rumcic over 5 years ago

Matthew Dillon wrote:

Ah damn. In that case nevermind, I wonder what I was doing, hm.
--
Regards,
Rumko

#3 Updated by corecode over 5 years ago

Still shouldn't panic or something, no?

cheers
simon

#4 Updated by rumcic over 5 years ago

Simon 'corecode' Schubert wrote:

Well it would be lovely if it wouldn't panic, but at least I have a faint idea
what caused it and will be more careful in the future.
--
Regards,
Rumko

#5 Updated by dillon over 5 years ago

:Well it would be lovely if it wouldn't panic, but at least I have a faint idea
:what caused it and will be more careful in the future.
:--
:Regards,
:Rumko

Various error paths would have to be added to the transaction API
to allow it to return an error and abort the sequence, instead
of panicing there. I'd rather not mess with it now but I guess we
will want to deal with it at some point in the futue.

-Matt
Matthew Dillon
<>

#6 Updated by alexh about 4 years ago

Matt,

is this fixed properly now?

Cheers,
Alex Hornung

Also available in: Atom PDF