Submit #2842
closed[PATCH] sys/vfs/hammer: Don't ex-lock all children on internal split
0%
Description
Matt, please review this.
The diff is also attached to this PR.
=====
Date: Sun, 6 Sep 2015 00:17:00 +0900
Subject: [PATCH] sys/vfs/hammer: Don't ex-lock all children on internal split
hammer_split_internal() ex-locks all children of the node
to split via hammer_btree_lock_children(), but it actually
only needs to lock children at (index >= split).
Children at (index >= 0 && index < split) are not touched
by internal split and don't get affected by this operation.
In the following example, internal split does nothing to
E with * and their child nodes.
This commit adds an extra argument int start_index to
hammer_btree_lock_children() to specify from which child
this function needs to ex-lock.
=====
before split
I
0 1 2 3 ... n
E E E E E E E E E R
/
/
I 62
0 ... s ... 61|
E E E E ... E E E R
* * *
=====
after split
I n+1
0 1 2 3 ... p |
E E E E E E E E E E R
/ \
/ \
I \
0 ... s \
E E E R \
* * * I
0 1 2 ...
E E E ... R
---
sys/vfs/hammer/hammer.h | 1
sys/vfs/hammer/hammer_btree.c | 20 ++++++++++---------
sys/vfs/hammer/hammer_rebalance.c | 2
sys/vfs/hammer/hammer_reblock.c | 2 +
4 files changed, 14 insertions(), 11 deletions(-)
diff --git a/sys/vfs/hammer/hammer.h b/sys/vfs/hammer/hammer.h
index c47889f..b35f2e9 100644
--- a/sys/vfs/hammer/hammer.h
++ b/sys/vfs/hammer/hammer.h@ -1265,6 +1265,7
@ void hammer_btree_lcache_init(hammer_mount_t hmp, hammer_node_lock_t lcache,
int depth);
void hammer_btree_lcache_free(hammer_mount_t hmp, hammer_node_lock_t lcache);
int hammer_btree_lock_children(hammer_cursor_t cursor, int depth,
int start_index,
hammer_node_lock_t parent,
hammer_node_lock_t lcache);
void hammer_btree_lock_copy(hammer_cursor_t cursor,
diff --git a/sys/vfs/hammer/hammer_btree.c b/sys/vfs/hammer/hammer_btree.c
index a34e6ca..cbd356a 100644
--- a/sys/vfs/hammer/hammer_btree.c
+++ b/sys/vfs/hammer/hammer_btree.c@ -1450,13 +1450,8
@ btree_split_internal(hammer_cursor_t cursor)
int i;
const int esize = sizeof(*elm);
- hammer_node_lock_init(&lockroot, cursor->node);
- error = hammer_btree_lock_children(cursor, 1, &lockroot, NULL);
- if (error)
- goto done;
if ((error = hammer_cursor_upgrade(cursor)) != 0)
- goto done;
- +hammer_stats_btree_splits;
return(error);
/*
* Calculate the split point. If the insertion point is at the
@ -1485,6 +1480,12
@ btree_split_internal(hammer_cursor_t cursor)
--split;
}
+ hammer_node_lock_init(&lockroot, cursor->node);
+ error = hammer_btree_lock_children(cursor, 1, split, &lockroot, NULL);
+ if (error)
+ goto done;
+ +hammer_stats_btree_splits;
/*
* If we are at the root of the filesystem, create a new root node
* with 1 element and split normally. Avoid making major@ -2631,7 +2632,7
@ hammer_btree_lcache_free(hammer_mount_t hmp, hammer_node_lock_t lcache)
* is usually aliased from a cursor.
*/
int
-hammer_btree_lock_children(hammer_cursor_t cursor, int depth,
+hammer_btree_lock_children(hammer_cursor_t cursor, int depth, int start_index,
hammer_node_lock_t parent,
hammer_node_lock_t lcache)
{@ -2657,7 +2658,7
@ hammer_btree_lock_children(hammer_cursor_t cursor, int depth,
* pre-get the children before trying to lock the mess. This is
* only done one-level deep for now.
/
- for (i = 0; i < ondisk->count; +i) {
for (i = start_index; i < ondisk->count; ++i) {
++hammer_stats_btree_elements;
elm = &ondisk->elms[i];
child = hammer_get_node(cursor->trans,@ -2670,7 +2671,7
@ hammer_btree_lock_children(hammer_cursor_t cursor, int depth,
/
* Do it for real
*/
- for (i = 0; error 0 && i < ondisk->count; +i) {
for (i = start_index; error 0 && i < ondisk->count; ++i) {
++hammer_stats_btree_elements;
elm = &ondisk->elms[i];
@ -2712,6 +2713,7
@ hammer_btree_lock_children(hammer_cursor_t cursor, int depth,
error = hammer_btree_lock_children(
cursor,
depth - 1,
+ 0,
item,
lcache);
}
diff --git a/sys/vfs/hammer/hammer_rebalance.c b/sys/vfs/hammer/hammer_rebalance.c
index 28247bd..0788d6d 100644
--- a/sys/vfs/hammer/hammer_rebalance.c
++ b/sys/vfs/hammer/hammer_rebalance.c@ -285,7 +285,7
@ rebalance_node(struct hammer_ioc_rebalance *rebal, hammer_cursor_t cursor,
error = hammer_cursor_upgrade(cursor);
if (error)
goto done;
- error = hammer_btree_lock_children(cursor, 2, &lockroot, lcache);
error = hammer_btree_lock_children(cursor, 2, 0, &lockroot, lcache);
if (error)
goto done;
diff --git a/sys/vfs/hammer/hammer_reblock.c b/sys/vfs/hammer/hammer_reblock.c
index 21aef3d..c002367 100644
--- a/sys/vfs/hammer/hammer_reblock.c
+++ b/sys/vfs/hammer/hammer_reblock.c@ -571,7 +571,7
@ hammer_reblock_int_node(struct hammer_ioc_reblock *reblock,
int error;
hammer_node_lock_init(&lockroot, cursor->node);
- error = hammer_btree_lock_children(cursor, 1, &lockroot, NULL);
+ error = hammer_btree_lock_children(cursor, 1, 0, &lockroot, NULL);
if (error)
goto done;
--
2.4.6
Files
Updated by tkusumi over 9 years ago
- Status changed from New to Closed
10:28 (dillon) tkusumi: I looked at that one and followed up on bugs. The patch looks ok but I would rather it not be applied. The btree code is extremely sensitive and it is locking all the children as a safety
10:29 (tkusumi) dillon: ok then i'll not commit it. i wanted to make sure if you see this as too sensitive.
10:29 (dillon) its definitely too sensitive :-)
10:29 (dillon) I have had nothing but trouble trying to optimize the btree operations in hammer1
10:30 (dillon) splits don't happen all that often, so there's no performance benefit to optimizing the locking there. Its best for it to just lock everything
10:31 (tkusumi) dillon: "I looked at that one and followed up on bugs" i don't see your followup on bug#2842. am i missing something ?
10:31 (tkusumi) i only see my post.
10:31 (dillon) hrm. I replied to the email. I don't know why it didn't automatically add it to the ticket
10:31 (dillon) grumble. something else I probably have to fix in the bug reporting system
10:33 (tkusumi) the other thing i thought about this func is that shouldn't you be doing
10:33 (tkusumi) error = hammer_cursor_upgrade(cursor)
10:33 (tkusumi) before
10:33 (tkusumi) hammer_node_lock_init(&lockroot, cursor->node); error = hammer_btree_lock_children(cursor, 1, split, &lockroot, NULL);
10:33 (tkusumi) ?
10:34 (tkusumi) i mean lock parent before its children.
10:34 (dillon) what line ?
10:34 (tkusumi) the beginning part of btree_split_internal().
10:35 (tkusumi) 1453
10:35 (tkusumi) rebalance and reblock gets node exlock before node's children's lock.
10:35 (tkusumi) but this split internal gets children locks first and then node.
10:36 (dillon) ah. there's deadlock detection in that sequence, so no. The reason I don't upgrade the parent first in this situation is because it can cause stalls in other threads due to the time it takes to lock the children
10:36 (tkusumi) (no actually reblock does)
10:37 (dillon) the cursor is already locked shared
10:37 (dillon) The upgrade can fail regardless of order due to already being locked shared