DragonFly BSD
 

Issue1054

Title HAMMER crash again
Priority bug Status resolved
Superseder Nosy List bastyaelvtars
Assigned To Topics

Created on 2008-07-07.18:08:01 by bastyaelvtars, last changed 2008-08-20.07:21:15 by matthias.

Messages
msg5288 (view) Author: matthias Date: 2008-08-20.07:21:15
Bug is fixed according to dillon@ in msg4887
msg4887 (view) Author: dillon Date: 2008-07-08.17:14:00
:>     It's turning out to be more serious.  The mirror propagation is
:> breaking something.  What was the machine doing at the time of the
:> panic?
:
:I was copying little files within a HAMMER partition. I do not use
:mirroring.
:
:-- 
:Gergo Szakal MD <bastyaelvtars@gmail.com>

    It was the propagation of the transaction id up the B-Tree, which occurs
    whether you are mirroring or not.

    It's all fixed now :-)

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4886 (view) Author: bastyaelvtars Date: 2008-07-08.16:08:03
On Mon, 7 Jul 2008 15:05:31 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:

I was copying little files within a HAMMER partition. I do not use
mirroring.
msg4882 (view) Author: dillon Date: 2008-07-07.22:47:06
::panic: hammer left edge case 0000000000004000 16384

    I've committed 60G which should hopefully deal with both bugs
    revealed by this panic and crash dump.

    Also, please continue to test filesystem out-of-space issues.  It
    should no longer create empty files after write() failures are
    reported.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4881 (view) Author: dillon Date: 2008-07-07.22:08:01
:panic: hammer left edge case 0000000000004000 16384
:-- 
:Gergo Szakal MD <bastyaelvtars@gmail.com>

    It's turning out to be more serious.  The mirror propagation is breaking
    something.  What was the machine doing at the time of the panic?

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4878 (view) Author: dillon Date: 2008-07-07.19:32:01
:panic: hammer left edge case 0000000000004000 16384
:
:...
:#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
:#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
:#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
:#ran_beg=131072, ran_end=9223372036854775807,
:    truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
:#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
:#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
:#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
:#hammer_flusher_slave_thread (arg=0xc1666870)
:#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
:#lwkt_deschedule_self (td=Cannot access memory at address 0x8
:) at /usr/src/sys/kern/lwkt_thread.c:223
:Backtrace stopped: previous frame inner to this frame (corrupt stack?)
:
:-- 
:Gergo Szakal MD <bastyaelvtars@gmail.com>

    Yah, I'd like the dump on that one, just to be sure it is what I think
    it is.

    It is related to the direct-write work.   Direct writes are allowed to
    write in-memory data records which override their on-media countparts
    (when seeking into the middle of a file and doing a write).  In this
    case the above situation has occured and then userland ALSO did a
    truncation on the file, causing hammer_ip_delete_range() to be
    called.

    Because *_delete_range() needs to see both records so it can delete
    both records, it has set the HAMMER_CURSOR_DELETE_VISIBILITY flag.

    The bug is actually that the assertion itself is wrong.  It sees the
    first record, shifts 'ran_beg' to the end of the first record, then
    sees the second record's base offset is less then the new ending
    offset because the second record is the on-disk version of the first
    record.  The assertion really needs to be:

	if (off < ran_beg && leaf->base.key > ran_beg)
	    panic(...)

    instead of:

	if (off < ran_beg)
		panic(...)

    In anycase, if you could send me the URL of your dump in private email
    I'll make sure that the state matches what I think happened.

    Nice catch!  340 assertions in HAMMER and this is like only the third
    one where the bug turned out to be the assertion itself :-)

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4873 (view) Author: bastyaelvtars Date: 2008-07-07.18:08:00
panic: hammer left edge case 0000000000004000 16384

Trace beginning at frame 0xcad9dbc0
panic(cad9dbe4,cad9dc5c,4000,0,cad9dc18) at panic+0x8c
panic(c0574358,4000,0,4000,4000) at panic+0x8c
hammer_ip_delete_range(cad9dc5c,cbdd6150,20000,0,ffffffff) at
hammer_ip_delete_r                                                ange
+0x1bf hammer_sync_inode(cbdd6150,cad88100,cad88000) at
hammer_sync_inode+0x1dc hammer_flusher_flush_inode
(cad88114,cad88114,c1666870,cad88000,3f) at
hammer_flu
sher_flush_inode+0x29 hammer_flusher_slave_thread(c1666870,0,0,0,0) at
hammer_flusher_slave_thread
+0xe                                                1 lwkt_exit() at
lwkt_exit Debugger("panic") Stopped at      Debugger+0x34:  movb
$0,in_Debugger.3945

I was doing a backtrace by myself. Anyway if you need a dump, tell me.

#0  dumpsys () at ./machine/thread.h:83
#1  0xc02c216e in boot (howto=260)
#at /usr/src/sys/kern/kern_shutdown.c:375 2  0xc02c228f in panic
#(fmt=0xc051bf77 "from debugger")
#at /usr/src/sys/kern/kern_shutdown.c:800 3  0xc01641c9 in db_panic
#(addr=-1068784292, have_addr=0, count=-1, modif=0xcad9da74 "")
#at /usr/src/sys/ddb/db_command.c:447 4  0xc0164834 in db_command_loop
#() at /usr/src/sys/ddb/db_command.c:343 5  0xc0166dac in db_trap
#(type=3, code=0) at /usr/src/sys/ddb/db_trap.c:71 6  0xc04ba6af in
#kdb_trap (type=3, code=0, regs=0xcad9db6c)
#at /usr/src/sys/platform/pc32/i386/db_interface.c:148 7  0xc04cbb22 in
#trap (frame=0xcad9db6c) at /usr/src/sys/platform/pc32/i386/trap.c:832
#8  0xc04bb3c7 in calltrap ()
#at /usr/src/sys/platform/pc32/i386/exception.s:785 9  0xc04ba55c in
#Debugger (msg=0xc053005b "panic") at ./cpu/cpufunc.h:73 10 0xc02c2286
#in panic (fmt=0xc0574358 "hammer left edge case %016llx %d\n")
#at /usr/src/sys/kern/kern_shutdown.c:798 11 0xc042f905 in
#hammer_ip_delete_range (cursor=0xcad9dc5c, ip=0xcbdd6150,
#ran_beg=131072, ran_end=9223372036854775807,
    truncating=2) at /usr/src/sys/vfs/hammer/hammer_object.c:1688
#12 0xc0428c5c in hammer_sync_inode (ip=0xcbdd6150)
#at /usr/src/sys/vfs/hammer/hammer_inode.c:2172 13 0xc0426f55 in
#hammer_flusher_flush_inode (ip=0xcbdd6150, trans=0xcad88114)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:361 14 0xc0427133 in
#hammer_flusher_slave_thread (arg=0xc1666870)
#at /usr/src/sys/vfs/hammer/hammer_flusher.c:236 15 0xc02c8ee8 in
#lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:223
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
History
Date User Action Args
2008-08-20 07:21:15matthiassetpriority: bug
status: chatting -> resolved
messages: + msg5288
2008-07-08 17:14:01dillonsetmessages: + msg4887
2008-07-08 16:08:06bastyaelvtarssetmessages: + msg4886
2008-07-07 22:47:12dillonsetmessages: + msg4882
2008-07-07 22:08:02dillonsetmessages: + msg4881
2008-07-07 19:32:05dillonsetstatus: unread -> chatting
messages: + msg4878
2008-07-07 18:08:01bastyaelvtarscreate