Bug #930

softdep panic while low on memory during "make buildworld"

Added by nthery over 6 years ago. Updated over 6 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hello,

I got a "softdep_deallocate_dependencies: dangling deps" panic during
"make buildworld" on a 64mb (virtual) machine. I'm running HEAD
modified to allocate proc structs off an objcache instead of a zone.
The panic is not reproducible with a stock HEAD build, so presumably
the objcache has a bigger footprint but that's another story.

softdep_deallocate_dependencies() is called from brelse():1088:

1085 if (bp->b_flags & (B_DELWRI | B_LOCKED)) {
1086 bp->b_flags &= ~B_RELBUF;
1087 } else if (vm_page_count_severe()) {
1088 buf_deallocate(bp);
1089 if (bp->b_flags & (B_DELWRI | B_LOCKED))
1090 bp->b_flags &= ~B_RELBUF;
1091 else
1092 bp->b_flags |= B_RELBUF;
1093 }

The call to buf_deallocate() was added in vfs_bio.c 1.96.

It looks like softdep_deallocate_dependencies() systematically panics.
Maybe it should take less drastic action?

Cheers,
Nicolas

History

#1 Updated by dillon over 6 years ago

:Hello,
:
:I got a "softdep_deallocate_dependencies: dangling deps" panic during
:"make buildworld" on a 64mb (virtual) machine. I'm running HEAD
:modified to allocate proc structs off an objcache instead of a zone.
:The panic is not reproducible with a stock HEAD build, so presumably
:the objcache has a bigger footprint but that's another story.
:
:softdep_deallocate_dependencies() is called from brelse():1088:

This has happened to me on a virtual machine too.

First, some background... softdeps holds onto buffers which have
dependancies, so if buf_deallocate() is EVER called on a softdep
bp, all softdep's callback does is panic because the situation is
not supposed to happen.

I haven't been able to track it down yet. The panic does not occur
very often at all and I was in the middle of some HAMMER debugging
when I got it and didn't want to switch modes, so to speak :-). It
has only happened to me once so far.

The VKERNEL's timing is very different from a normal kernel, and
most people run it with significantly less 'memory' then a real
kernel (I specify 64MB for the one I test HAMMER in, similar to
the 64MB vkernel you were using), and those differences tend to bring
out software bugs that are not normally apparent when running a
real kernel.

I am virtually certain that it is a bug in softdeps, probably related
to its low memory code which is almost never exercised on a normal kernel.
I don't know any more then that at the moment.

-Matt

#3 Updated by pavalos over 6 years ago

It happens to me on a real live kernel.

I have 2G of RAM, and I'm able to reproduce this if I really stress the
machine. I've turned off softupdates since this was killing my
productivity.

#4 Updated by dillon over 6 years ago

:I have 2G of RAM, and I'm able to reproduce this if I really stress the
:machine. I've turned off softupdates since this was killing my
:productivity.

Could you try to get a kernel core? Maybe there's something else going
on here that can be easily tracked down. I don't recall softupdates
causing this panic in the past. Maybe I broke something when I
augmented the biodeps code.

-Matt
Matthew Dillon
<>

#5 Updated by pavalos over 6 years ago

Oh yeh, they're already on leaf. Look at ~pavalos/crash/*10 and *11.

I thought it might be a problem w/ the aic7xxx locking i was doing, which
is why I originally posted here:

http://leaf.dragonflybsd.org/mailarchive/kernel/2008-01/msg00054.html

#6 Updated by nthery over 6 years ago

I noticed that bp->b_dep is an empty list. So it looks like there are
no dangling dependencies when buf_deallocate() is called.

I also noticed that bp->b_ops is set by hammerfs and ffs but never
reset to NULL. So maybe the faulty buffer had initially some
dependencies that were resolved and the buffer was later on recycled
for another operation without dependency.

WARNING: I've got a very limited knowledge of softupdates, so don't
try the following!
To check this hypothesis, I applied the following crude patch and I'm
rebuilding world right now on my 64mb machine:

Index: src2/sys/vfs/ufs/ffs_softdep.c
===================================================================
--- src2.orig/sys/vfs/ufs/ffs_softdep.c 2008-01-05 15:02:41.000000000 +0100
+++ src2/sys/vfs/ufs/ffs_softdep.c 2008-01-22 13:51:29.000000000 +0100
@@ -5137,6 +5137,10 @@
static void
softdep_deallocate_dependencies(struct buf *bp)
{
+ /* XXX */
+ if (LIST_EMPTY(&bp->b_dep))
+ return;
+
if ((bp->b_flags & B_ERROR) == 0)
panic("softdep_deallocate_dependencies: dangling deps");
softdep_error(bp->b_vp->v_mount->mnt_stat.f_mntfromname, bp->b_error);

Should that hypothesis proves true, it may be better to reset
bp->b_ops when the last dependency is resolved.

Let's wait for that buildworld to complete or panic ...

#7 Updated by nthery over 6 years ago

No panic with this patch.

Upon closer inspection, the other calls to buf_deallocate() (in
brelse() and getnewbuf()) are done if and only if b_dep is not empty.
So a better patch would be:

Index: src2/sys/kern/vfs_bio.c
===================================================================
--- src2.orig/sys/kern/vfs_bio.c 2008-01-20 19:27:54.000000000 +0100
+++ src2/sys/kern/vfs_bio.c 2008-01-22 17:58:05.000000000 +0100
@@ -1085,7 +1085,8 @@
if (bp->b_flags & (B_DELWRI | B_LOCKED)) {
bp->b_flags &= ~B_RELBUF;
} else if (vm_page_count_severe()) {
- buf_deallocate(bp);
+ if (LIST_FIRST(&bp->b_dep) != NULL)
+ buf_deallocate(bp);
if (bp->b_flags & (B_DELWRI | B_LOCKED))
bp->b_flags &= ~B_RELBUF;
else

If there are no objections, I'll commit this on sunday.

#8 Updated by dillon over 6 years ago

:No panic with this patch.
:
:Upon closer inspection, the other calls to buf_deallocate() (in
:brelse() and getnewbuf()) are done if and only if b_dep is not empty.
:So a better patch would be:
:
:Index: src2/sys/kern/vfs_bio.c
:===================================================================
:--- src2.orig/sys/kern/vfs_bio.c 2008-01-20 19:27:54.000000000 +0100
:+++ src2/sys/kern/vfs_bio.c 2008-01-22 17:58:05.000000000 +0100
:@@ -1085,7 +1085,8 @@
: if (bp->b_flags & (B_DELWRI | B_LOCKED)) {
: bp->b_flags &= ~B_RELBUF;
: } else if (vm_page_count_severe()) {
:- buf_deallocate(bp);
:+ if (LIST_FIRST(&bp->b_dep) != NULL)
:+ buf_deallocate(bp);
: if (bp->b_flags & (B_DELWRI | B_LOCKED))
: bp->b_flags &= ~B_RELBUF;
: else
:
:If there are no objections, I'll commit this on sunday.

You found it. I must have muffed it up when I made the changes
to support HAMMER's more sophisticated use of the bioops.

-Matt
Matthew Dillon
<>

#9 Updated by nthery over 6 years ago

Fix committed.

Also available in: Atom PDF