Bug #2374

softupdates locking bug

Added by vsrinivas almost 2 years ago. Updated over 1 year ago.

Status:ResolvedStart date:05/23/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

softupdates may still have some locking issues:

In -master:
* http://leaf.dragonflybsd.org/~marino/core/core.20120523.txt
Panic where softdep_update_inodeblock() called bwrite() with a NULL buffer

getdirtybuf returned 'gotit', yet it either returned a NULL bp or the buffer was nulled after
it was saved in the inodedep structure. getdirtybuf can block and does release the softdep
lock while locking dirty buffers, but it is not clear if anyone can race in and result in the failure mode seen.

* (from 3.0.3 catchall bug (2336)):
Deadlock in -master with softdep. No more details available.


Related issues

Related to Bug #2336: 3.0.3 catchall Resolved 03/26/2012
Related to Bug #2402: Showstopper panics for Release 3.2 New 08/15/2012

History

#1 Updated by marino almost 2 years ago

Perhaps related:

Occasionally I see this kernel message while packages are building:
softdep_setup_freeblocks_bp(1): caught <id> going away

#2 Updated by marino over 1 year ago

An new bug related to soft updates: panic: flush_pagedep_deps: MKDIR_BODY
full core txt: http://leaf.dragonflybsd.org/~marino/core/core.flush_pagedep_deps.txt

core file located at ~/marino/crash on leaf: core.flush_pagedep_deps.txz

#3 Updated by marino over 1 year ago

I hit this exact panic again today.
Do you need the core or is the first one good enough?

#4 Updated by vsrinivas over 1 year ago

Commit 8224c9ea7d94389a63b07be4401f0b05912f8f4a likely fixes this bug; getdirtybuf could return success incorrectly earlier.

#5 Updated by vsrinivas over 1 year ago

  • Status changed from New to Feedback

#6 Updated by vsrinivas over 1 year ago

Some hours of fsstress testing hit a deadlock w/ softdep and the patch; I haven't been able to root case it, but here are some hints:

1) the syncer (syncer0) is waiting for vnlru to make progress; its backtrace is:

(bioops callback)
softdep_process_worklist
process_worklist_item
handle_workitem_remove
*_ WE DO NOT HOLD THE SOFTDEP LOCK AROUND VFS_VGET _*
vfs_vget
ffs_vget
getnewvnode
allocvnode
vnlru_proc_wait

2) vnlru is not making progress; it is trying to lock a buffer associated with UFS, its backtrace is:
ssleep
acquire
lockmgr
(BUF_TIMELOCK)
vinvalbuf_bp
vlrureclaim
mountlist_scan

3) The buffer in question is a BUF_CMD_WRITE buffer, its lock is marked by LK_KERNTHREAD, and is a softdep buffer (seen via b_ops being the softdep bioops). The vnode associated with the buffer is held locked by the vnlru thread. I think these are the buffer's flags : B_CACHE|B_HASHED|B_BNOCLIP|B_IODEBUG|B_VNCLEAN|B_VMIO
.

#7 Updated by vsrinivas over 1 year ago

  • Status changed from Feedback to Resolved

vnode LRU deadlock was solved by 62ae46c924bd3c2efd985c79dac02be03360e6a6. flush_pagedep_deps panic was solved by ca55765aeb1b1a6aa5f39b49ea1e514c7ab60178.

Also available in: Atom PDF