Bug #1054: HAMMER crash again - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1054

closed

HAMMER crash again

Added by bastyaelvtars over 17 years ago. Updated over 17 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

panic: hammer left edge case 0000000000004000 16384

Trace beginning at frame 0xcad9dbc0
panic(cad9dbe4,cad9dc5c,4000,0,cad9dc18) at panic+0x8c
panic(c0574358,4000,0,4000,4000) at panic+0x8c
hammer_ip_delete_range(cad9dc5c,cbdd6150,20000,0,ffffffff) at
hammer_ip_delete_r ange
+0x1bf hammer_sync_inode(cbdd6150,cad88100,cad88000) at
hammer_sync_inode+0x1dc hammer_flusher_flush_inode
(cad88114,cad88114,c1666870,cad88000,3f) at
hammer_flu
sher_flush_inode+0x29 hammer_flusher_slave_thread(c1666870,0,0,0,0) at
hammer_flusher_slave_thread
+0xe 1 lwkt_exit() at
lwkt_exit Debugger("panic") Stopped at Debugger+0x34: movb
$0,in_Debugger.3945

I was doing a backtrace by myself. Anyway if you need a dump, tell me.

#0 dumpsys () at ./machine/thread.h:83
#1 0xc02c216e in boot (howto=260)
#at /usr/src/sys/kern/kern_shutdown.c:375 2 0xc02c228f in panic
#(fmt=0xc051bf77 "from debugger")
#at /usr/src/sys/kern/kern_shutdown.c:800 3 0xc01641c9 in db_panic
#(addr=-1068784292, have_addr=0, count=-1, modif=0xcad9da74 "")
#at /usr/src/sys/ddb/db_command.c:447 4 0xc0164834 in db_command_loop
#() at /usr/src/sys/ddb/db_command.c:343 5 0xc0166dac in db_trap
#(type=3, code=0) at /usr/src/sys/ddb/db_trap.c:71 6 0xc04ba6af in
#kdb_trap (type=3, code=0, regs=0xcad9db6c)
#at /usr/src/sys/platform/pc32/i386/db_interface.c:148 7 0xc04cbb22 in
#trap (frame=0xcad9db6c) at /usr/src/sys/platform/pc32/i386/trap.c:832
#8 0xc04bb3c7 in calltrap ()
#at /usr/src/sys/platform/pc32/i386/exception.s:785 9 0xc04ba55c in
#Debugger (msg=0xc053005b "panic") at ./cpu/cpufunc.h:73 10 0xc02c2286
#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
#ran_beg=131072, ran_end=9223372036854775807,
truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
#hammer_flusher_slave_thread (arg=0xc1666870)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
#lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:223
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Actions

Copy link

Updated by dillon over 17 years ago

:panic: hammer left edge case 0000000000004000 16384
:
:...
:#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
:#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
:#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
:#ran_beg=131072, ran_end=9223372036854775807,
: truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
:#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
:#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
:#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
:#hammer_flusher_slave_thread (arg=0xc1666870)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
:#lwkt_deschedule_self (td=Cannot access memory at address 0x8
:) at /usr/src/sys/kern/lwkt_thread.c:223
:Backtrace stopped: previous frame inner to this frame (corrupt stack?)
:
:--
:Gergo Szakal MD <bastyaelvtars@gmail.com>

Yah, I'd like the dump on that one, just to be sure it is what I think
    it is.

It is related to the direct-write work.   Direct writes are allowed to
    write in-memory data records which override their on-media countparts
    (when seeking into the middle of a file and doing a write).  In this
    case the above situation has occured and then userland ALSO did a
    truncation on the file, causing hammer_ip_delete_range() to be
    called.

Because *_delete_range() needs to see both records so it can delete
    both records, it has set the HAMMER_CURSOR_DELETE_VISIBILITY flag.

The bug is actually that the assertion itself is wrong.  It sees the
    first record, shifts 'ran_beg' to the end of the first record, then
    sees the second record's base offset is less then the new ending
    offset because the second record is the on-disk version of the first
    record.  The assertion really needs to be:

if (off < ran_beg && leaf->base.key > ran_beg)
        panic(...)

instead of:

if (off < ran_beg)
        panic(...)

In anycase, if you could send me the URL of your dump in private email
    I'll make sure that the state matches what I think happened.

Nice catch!  340 assertions in HAMMER and this is like only the third
    one where the bug turned out to be the assertion itself :-)

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by dillon over 17 years ago

:panic: hammer left edge case 0000000000004000 16384
:--
:Gergo Szakal MD <bastyaelvtars@gmail.com>

It's turning out to be more serious.  The mirror propagation is breaking
    something.  What was the machine doing at the time of the panic?

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by dillon over 17 years ago

::panic: hammer left edge case 0000000000004000 16384

I've committed 60G which should hopefully deal with both bugs
    revealed by this panic and crash dump.

Also, please continue to test filesystem out-of-space issues.  It
    should no longer create empty files after write() failures are
    reported.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by bastyaelvtars over 17 years ago

On Mon, 7 Jul 2008 15:05:31 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:

I was copying little files within a HAMMER partition. I do not use
mirroring.

Actions

Copy link

Updated by dillon over 17 years ago

:> It's turning out to be more serious. The mirror propagation is
:> breaking something. What was the machine doing at the time of the
:> panic?
:
:I was copying little files within a HAMMER partition. I do not use
:mirroring.
:
:--
:Gergo Szakal MD <bastyaelvtars@gmail.com>

It was the propagation of the transaction id up the B-Tree, which occurs
    whether you are mirroring or not.

It's all fixed now :-)

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by matthias over 17 years ago

Bug is fixed according to dillon@ in msg4887

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #1054

HAMMER crash again

Updated by dillon over 17 years ago

Updated by dillon over 17 years ago

Updated by dillon over 17 years ago

Updated by bastyaelvtars over 17 years ago

Updated by dillon over 17 years ago

Updated by matthias over 17 years ago