Project

General

Profile

Actions

Bug #2374

closed

softupdates locking bug

Added by vsrinivas over 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
05/23/2012
Due date:
% Done:

0%

Estimated time:

Description

softupdates may still have some locking issues:

In -master:

getdirtybuf returned 'gotit', yet it either returned a NULL bp or the buffer was nulled after
it was saved in the inodedep structure. getdirtybuf can block and does release the softdep
lock while locking dirty buffers, but it is not clear if anyone can race in and result in the failure mode seen.

  • (from 3.0.3 catchall bug (2336)):
    Deadlock in -master with softdep. No more details available.

Related issues 2 (0 open2 closed)

Related to Bug #2336: 3.0.3 catchall Resolved03/26/2012

Actions
Related to Bug #2402: Showstopper panics for Release 3.2Closedtuxillo08/15/2012

Actions
Actions #1

Updated by marino over 12 years ago

Perhaps related:

Occasionally I see this kernel message while packages are building:
softdep_setup_freeblocks_bp(1): caught <id> going away

Actions #2

Updated by marino about 12 years ago

An new bug related to soft updates: panic: flush_pagedep_deps: MKDIR_BODY
full core txt: http://leaf.dragonflybsd.org/~marino/core/core.flush_pagedep_deps.txt

core file located at ~/marino/crash on leaf: core.flush_pagedep_deps.txz

Actions #3

Updated by marino about 12 years ago

I hit this exact panic again today.
Do you need the core or is the first one good enough?

Actions #4

Updated by vsrinivas about 12 years ago

Commit 8224c9ea7d94389a63b07be4401f0b05912f8f4a likely fixes this bug; getdirtybuf could return success incorrectly earlier.

Actions #5

Updated by vsrinivas about 12 years ago

  • Status changed from New to Feedback
Actions #6

Updated by vsrinivas about 12 years ago

Some hours of fsstress testing hit a deadlock w/ softdep and the patch; I haven't been able to root case it, but here are some hints:

1) the syncer (syncer0) is waiting for vnlru to make progress; its backtrace is:

(bioops callback)
softdep_process_worklist
process_worklist_item
handle_workitem_remove
_ WE DO NOT HOLD THE SOFTDEP LOCK AROUND VFS_VGET _
vfs_vget
ffs_vget
getnewvnode
allocvnode
vnlru_proc_wait

2) vnlru is not making progress; it is trying to lock a buffer associated with UFS, its backtrace is:
ssleep
acquire
lockmgr
(BUF_TIMELOCK)
vinvalbuf_bp
vlrureclaim
mountlist_scan

3) The buffer in question is a BUF_CMD_WRITE buffer, its lock is marked by LK_KERNTHREAD, and is a softdep buffer (seen via b_ops being the softdep bioops). The vnode associated with the buffer is held locked by the vnlru thread. I think these are the buffer's flags : B_CACHE|B_HASHED|B_BNOCLIP|B_IODEBUG|B_VNCLEAN|B_VMIO
.

Actions #7

Updated by vsrinivas almost 12 years ago

  • Status changed from Feedback to Resolved

vnode LRU deadlock was solved by 62ae46c924bd3c2efd985c79dac02be03360e6a6. flush_pagedep_deps panic was solved by ca55765aeb1b1a6aa5f39b49ea1e514c7ab60178.

Actions

Also available in: Atom PDF