Bug #1729: Hammer REDO recovery panic - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1729

closed

Hammer REDO recovery panic

Added by vsrinivas over 15 years ago. Updated almost 13 years ago.

Status:

Closed

Priority:

Normal

Assignee:

tuxillo

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

Restarting my system after an earlier panic from running fsstress on HAMMER,
fsync_mode=2. Running DragonFly 2.6.0-gfa1ae. Will try to have a dump tomorrow.

Mounting root from hammer:serno/G3H3DSKC.s1d
tryroot serno/G3H2DSKC.s1d
HAMMER recovery check seqno=023f0d60
HAMMER recovery range 3000000003838438-3000000003fee500
HAMMER recovery nexto 3000000003fee500 endseqno=023f6204
HAMMER recovery undo 3000000003838438-30000000003fee500 (8085704
bytes)(RW)
HAMMER Continuing recovery
HAMMER Continuing recovery
HAMMER Continuing recovery
HAMMER Continuing recovery
HAMMER Found REDO_SYNC 30000000025c4a38
HAMMER recovery complete
HAMMER recovery redo 3000000003838438-30000000003fee500 (80857604
bytes)(RW)
HAMMER Find extended redo 30000000025c4a38, 19347968 extbytes
HAMMER Find extended redo failed 34, unable to run REDO
HAMMER End redo recovery
panic: hammer_ref_interlock_true: bad lock 0xc758a688 00000001

Trace beginning at frame 0xc05d4af4
panic(c05d4b18,50000001,c758a680,c02ed226,c05d4b24) at panic+0x8c
panic(c03e2510,c758a688,1,c758a680,c05d4b34) at panic+0x8c
hammer_ref_interlock_true(c758a688,c758a680,c05d4b5c,c02ed31c,c758a680) at
hammer_ref_interlock_true+0x25
hammer_unload_buffer(c758a680,0,0,0,c758a600) at hammer_unload_buffer+0x38
hammer_buf_rb_tree_RB_SCAN(c740d034,0,c02ee747,0,c1508040) at
hammer_buf_rb_tree_RB_SCAN+0xad
hammer_free_hmp(c740d384,c1225018,1,22,c10c2180) at hammer_free_hmp+0x13b
hammer_vfs_mount(c70d17b8,0,0,c612f5b0,c05d4cec) at hammer_vfs_mount+0xa57
vfs_mount(c70d17b8,0,0,c612f5b0,c10c2180) at vfs_mount+0x32
vfs_mountroot_try(c03bbab0)
vfs_mountroot(0,ffffffff,5d1c00,5df000,5df000) at vfs_mountroot+0x7b
mi_startup(5d1000,0,0,0,0) at mi_startup+0x92
begin() at begin+0x42
Debugger("panic")
Stopped at Debugger+0x34: movb $0,in_Debugger.4308
db>

Actions

Copy link

Updated by dillon over 15 years ago

:New submission from Venkatesh Srinivas <me@acm.jhu.edu>:
:
:Restarting my system after an earlier panic from running fsstress on HAMMER,
:fsync_mode=2. Running DragonFly 2.6.0-gfa1ae. Will try to have a dump tomorrow.
:
:Mounting root from hammer:serno/G3H3DSKC.s1d
:tryroot serno/G3H2DSKC.s1d
:HAMMER recovery check seqno=023f0d60
:HAMMER recovery range 3000000003838438-3000000003fee500
:HAMMER recovery nexto 3000000003fee500 endseqno=023f6204
:HAMMER recovery undo 3000000003838438-30000000003fee500 (8085704
:bytes)(RW)
:HAMMER Continuing recovery
:HAMMER Continuing recovery
:HAMMER Continuing recovery
:HAMMER Continuing recovery
:HAMMER Found REDO_SYNC 30000000025c4a38
:HAMMER recovery complete
:HAMMER recovery redo 3000000003838438-30000000003fee500 (80857604
:bytes)(RW)
:HAMMER Find extended redo 30000000025c4a38, 19347968 extbytes
:HAMMER Find extended redo failed 34, unable to run REDO
:HAMMER End redo recovery
:panic: hammer_ref_interlock_true: bad lock 0xc758a688 00000001
:
:Trace beginning at frame 0xc05d4af4
:panic(c05d4b18,50000001,c758a680,c02ed226,c05d4b24) at panic+0x8c
:...

Ok, I found the bug related to the panic.  I have committed a
    fix to head and will MFC to 2.6.x.

The REDO error itself is another problem.  That error is not supposed
    to happen.  Please run the command 'hammer -f &lt;device&gt; show-undo' and
    put the output on your leaf account.  How large is the HAMMER filesystem?
    (Hopefully the data hasn't been lost since that time, I'm crossing my
    fingers).

You may have issues booting.  You need to boot with a fixed kernel to
    get past the panic and that may require booting from a USB stick or
    something

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by vsrinivas over 15 years ago

http://acm.jhu.edu/~me/redo_panic holds the show-undo for the fs (undo.gz) and
the vmcore/kern for the panic that occurs when you attempt to mount it.

Actions

Copy link

Updated by vsrinivas about 15 years ago

Have you had a chance to look at this redo_panic log?

I'd like to repurpose the disk holding this fs soon, but if you'd like or think
there's a chance to recover the fs, I'll keep it around.

In the future, for fses with a REDO fifo problem, would it make sense to offer a
'really read-only' mount that doesn't attempt to replay the redo fifo? That way,
we'd at least stand a chance at salvaging data...

Thanks,
-- vs

Actions

Copy link

Updated by dillon about 15 years ago

:Have you had a chance to look at this redo_panic log?
:
:I'd like to repurpose the disk holding this fs soon, but if you'd like or think
:there's a chance to recover the fs, I'll keep it around.
:
:In the future, for fses with a REDO fifo problem, would it make sense to offer a
:'really read-only' mount that doesn't attempt to replay the redo fifo? That way,
:we'd at least stand a chance at salvaging data...
:
:Thanks,
:-- vs

The lock panic was due to a mismatched lock/unlock which should have
    been fixed.

An UNDO recovery error is fatal, but a REDO recovery error is not
    fatal.  It does mean the REDO failed but the filesystem itself will
    wind up in a working state.

-Matt

Actions

Copy link

Updated by vsrinivas about 15 years ago

Oh, I guess I wasn't clear; even after the lock fix, I am unable to mount the
filesystem. During mount, the failure to run REDO prevents the fs from mounting...

-- vs

Actions

Copy link

Updated by dillon about 15 years ago

:Venkatesh Srinivas <me@acm.jhu.edu> added the comment:
:
:Oh, I guess I wasn't clear; even after the lock fix, I am unable to mount the
:filesystem. During mount, the failure to run REDO prevents the fs from mounting...
:
:-- vs

No, you were clear.  I guess what I need to do is to hack the code
    to force a failure during the redo run and track down why it isn't
    allowing the mount.

Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by tuxillo almost 13 years ago

Status changed from New to Closed
Assignee changed from 0 to tuxillo

Venk,

I think this commit would help you mount in that case: http://gitweb.dragonflybsd.org/dragonfly.git/commit/dbd4f60002b98556e6fc8413e6eacf2aedfce6df
Since the mismatched locking was corrected by Matt and there's now a workaround to avoid REDO stage (so you are able to mount in that situation) I will close this ticket.

Cheers,
Antonio Huete

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #1729

Hammer REDO recovery panic

Updated by dillon over 15 years ago

Updated by vsrinivas over 15 years ago

Updated by vsrinivas about 15 years ago

Updated by dillon about 15 years ago

Updated by vsrinivas about 15 years ago

Updated by dillon about 15 years ago

Updated by tuxillo almost 13 years ago