Project

General

Profile

Actions

Bug #1054

closed

HAMMER crash again

Added by bastyaelvtars over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

panic: hammer left edge case 0000000000004000 16384

Trace beginning at frame 0xcad9dbc0
panic(cad9dbe4,cad9dc5c,4000,0,cad9dc18) at panic+0x8c
panic(c0574358,4000,0,4000,4000) at panic+0x8c
hammer_ip_delete_range(cad9dc5c,cbdd6150,20000,0,ffffffff) at
hammer_ip_delete_r ange
+0x1bf hammer_sync_inode(cbdd6150,cad88100,cad88000) at
hammer_sync_inode+0x1dc hammer_flusher_flush_inode
(cad88114,cad88114,c1666870,cad88000,3f) at
hammer_flu
sher_flush_inode+0x29 hammer_flusher_slave_thread(c1666870,0,0,0,0) at
hammer_flusher_slave_thread
+0xe 1 lwkt_exit() at
lwkt_exit Debugger("panic") Stopped at Debugger+0x34: movb
$0,in_Debugger.3945

I was doing a backtrace by myself. Anyway if you need a dump, tell me.

#0 dumpsys () at ./machine/thread.h:83
#1 0xc02c216e in boot (howto=260)
#at /usr/src/sys/kern/kern_shutdown.c:375 2 0xc02c228f in panic
#(fmt=0xc051bf77 "from debugger")
#at /usr/src/sys/kern/kern_shutdown.c:800 3 0xc01641c9 in db_panic
#(addr=-1068784292, have_addr=0, count=-1, modif=0xcad9da74 "")
#at /usr/src/sys/ddb/db_command.c:447 4 0xc0164834 in db_command_loop
#() at /usr/src/sys/ddb/db_command.c:343 5 0xc0166dac in db_trap
#(type=3, code=0) at /usr/src/sys/ddb/db_trap.c:71 6 0xc04ba6af in
#kdb_trap (type=3, code=0, regs=0xcad9db6c)
#at /usr/src/sys/platform/pc32/i386/db_interface.c:148 7 0xc04cbb22 in
#trap (frame=0xcad9db6c) at /usr/src/sys/platform/pc32/i386/trap.c:832
#8 0xc04bb3c7 in calltrap ()
#at /usr/src/sys/platform/pc32/i386/exception.s:785 9 0xc04ba55c in
#Debugger (msg=0xc053005b "panic") at ./cpu/cpufunc.h:73 10 0xc02c2286
#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
#ran_beg=131072, ran_end=9223372036854775807,
truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
#hammer_flusher_slave_thread (arg=0xc1666870)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
#lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:223
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Actions #1

Updated by dillon over 13 years ago

:panic: hammer left edge case 0000000000004000 16384
:
:...
:#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
:#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
:#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
:#ran_beg=131072, ran_end=9223372036854775807,
: truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
:#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
:#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
:#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
:#hammer_flusher_slave_thread (arg=0xc1666870)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
:#lwkt_deschedule_self (td=Cannot access memory at address 0x8
:) at /usr/src/sys/kern/lwkt_thread.c:223
:Backtrace stopped: previous frame inner to this frame (corrupt stack?)
:
:--
:Gergo Szakal MD <>

Yah, I'd like the dump on that one, just to be sure it is what I think
it is.
It is related to the direct-write work.   Direct writes are allowed to
write in-memory data records which override their on-media countparts
(when seeking into the middle of a file and doing a write). In this
case the above situation has occured and then userland ALSO did a
truncation on the file, causing hammer_ip_delete_range() to be
called.
Because *_delete_range() needs to see both records so it can delete
both records, it has set the HAMMER_CURSOR_DELETE_VISIBILITY flag.
The bug is actually that the assertion itself is wrong.  It sees the
first record, shifts 'ran_beg' to the end of the first record, then
sees the second record's base offset is less then the new ending
offset because the second record is the on-disk version of the first
record. The assertion really needs to be:
if (off < ran_beg && leaf->base.key > ran_beg)
panic(...)
instead of:
if (off < ran_beg)
panic(...)
In anycase, if you could send me the URL of your dump in private email
I'll make sure that the state matches what I think happened.
Nice catch!  340 assertions in HAMMER and this is like only the third
one where the bug turned out to be the assertion itself :-)
-Matt
Matthew Dillon
&lt;&gt;
Actions #2

Updated by dillon over 13 years ago

:panic: hammer left edge case 0000000000004000 16384
:--
:Gergo Szakal MD <>

It's turning out to be more serious.  The mirror propagation is breaking
something. What was the machine doing at the time of the panic?
-Matt
Matthew Dillon
&lt;&gt;
Actions #3

Updated by dillon over 13 years ago

::panic: hammer left edge case 0000000000004000 16384

I've committed 60G which should hopefully deal with both bugs
revealed by this panic and crash dump.
Also, please continue to test filesystem out-of-space issues.  It
should no longer create empty files after write() failures are
reported.
-Matt
Matthew Dillon
&lt;&gt;
Actions #4

Updated by bastyaelvtars over 13 years ago

On Mon, 7 Jul 2008 15:05:31 -0700 (PDT)
Matthew Dillon <> wrote:

I was copying little files within a HAMMER partition. I do not use
mirroring.

Actions #5

Updated by dillon over 13 years ago

:> It's turning out to be more serious. The mirror propagation is
:> breaking something. What was the machine doing at the time of the
:> panic?
:
:I was copying little files within a HAMMER partition. I do not use
:mirroring.
:
:--
:Gergo Szakal MD <>

It was the propagation of the transaction id up the B-Tree, which occurs
whether you are mirroring or not.
It's all fixed now :-)
-Matt
Matthew Dillon
&lt;&gt;
Actions #6

Updated by matthias over 13 years ago

Bug is fixed according to dillon@ in msg4887

Actions

Also available in: Atom PDF