Bug #276
closedPatch to try - Re: Sendmail rset command hangs socket on 1.6-Release
0%
Description
On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:
Please try this patch and tell me if it works. I think we have an issue
when one process holds an exclusive lock while 2 or more processes are
trying to get a shared lock, or vise-versa.-Matt
Index: kern_lockf.c ===================================================================
RCS file: /cvs/src/sys/kern/kern_lockf.c,v
retrieving revision 1.32
diffu -r1.32 kern_lockf.ckern_lockf.c 25 Jul 2006 20:01:50
--0000 1.32>lf_blocked, range, lf_link);
++ kern_lockf.c 2 Aug 2006 17:23:56 -0000@ -772,8 +772,10
@
TAILQ_REMOVE(&lock
range->lf_flags = 1;
wakeup(range);
#if 0
if (range->lf_start >= start && range->lf_end <= end)
break;
+#endif
}
}
I have applied the patch (and recompiled) and am letting the system run
full steam right now (including the milter, etc); the initial results
look promising as it has not exhibited the aberrant behavior as of yet.
I will post a followup after letting this run all night (assuming it
does so) or after it fails (which hopefully won't happen).
Sven
Updated by sven almost 19 years ago
On Wed, 2006-08-02 at 17:41 -0400, Sven Willenberger wrote:
On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:
Please try this patch and tell me if it works. I think we have an issue
when one process holds an exclusive lock while 2 or more processes are
trying to get a shared lock, or vise-versa.-Matt
Index: kern_lockf.c ===================================================================
RCS file: /cvs/src/sys/kern/kern_lockf.c,v
retrieving revision 1.32
diffu -r1.32 kern_lockf.ckern_lockf.c 25 Jul 2006 20:01:50
--0000 1.32>lf_blocked, range, lf_link);
++ kern_lockf.c 2 Aug 2006 17:23:56 -0000@ -772,8 +772,10
@
TAILQ_REMOVE(&lock
range->lf_flags = 1;
wakeup(range);
#if 0
if (range->lf_start >= start && range->lf_end <= end)
break;
+#endif
}
}I have applied the patch (and recompiled) and am letting the system run
full steam right now (including the milter, etc); the initial results
look promising as it has not exhibited the aberrant behavior as of yet.
I will post a followup after letting this run all night (assuming it
does so) or after it fails (which hopefully won't happen).Sven
As a followup, the server has been running without a hitch now for 18
hours so it would appear that the above patch has fixed the situation,
unless some other more rare situation/condition crops up that would
cause this lock.
Sven
Updated by dillon almost 19 years ago
:> I have applied the patch (and recompiled) and am letting the system run
:> full steam right now (including the milter, etc); the initial results
:> look promising as it has not exhibited the aberrant behavior as of yet.
:> I will post a followup after letting this run all night (assuming it
:> does so) or after it fails (which hopefully won't happen).
:>
:> Sven
:>
:
:As a followup, the server has been running without a hitch now for 18
:hours so it would appear that the above patch has fixed the situation,
:unless some other more rare situation/condition crops up that would
:cause this lock.
:
:Sven
Ok, that's good to hear. I'll get the patch committed to both HEAD and
REL.
This bug is serious enough to warrent rolling 1.6.1 next week, probably
Monday.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by qhwt+dfly almost 19 years ago
On Thu, Aug 03, 2006 at 08:56:47AM -0700, Matthew Dillon wrote:
:> I have applied the patch (and recompiled) and am letting the system run
:> full steam right now (including the milter, etc); the initial results
:> look promising as it has not exhibited the aberrant behavior as of yet.
:> I will post a followup after letting this run all night (assuming it
:> does so) or after it fails (which hopefully won't happen).
:>
:> Sven
:>
:
:As a followup, the server has been running without a hitch now for 18
:hours so it would appear that the above patch has fixed the situation,
:unless some other more rare situation/condition crops up that would
:cause this lock.
:
:SvenOk, that's good to hear. I'll get the patch committed to both HEAD and
REL.
Does 1.4.x-RELEASE have this problem too?
Updated by dillon almost 19 years ago
:
:Does 1.4.x-RELEASE have this problem too?
The lockf code is different in 1.4. There is a similar test in lf_wakeup,
and it doesn't look right to me, but I don't know if the bug can be
triggered or not.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by hamilton almost 19 years ago
Matthew Dillon <dillon@apollo.backplane.com>, said on Thu Aug 03, 2006 [08:12:27 PM]:
} :
} :Does 1.4.x-RELEASE have this problem too?
}
} The lockf code is different in 1.4. There is a similar test in lf_wakeup,
} and it doesn't look right to me, but I don't know if the bug can be
} triggered or not.
The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.
--
Jon Hamilton
hamilton@pobox.com
Updated by dillon almost 19 years ago
:...
:} and it doesn't look right to me, but I don't know if the bug can be
:} triggered or not.
:
:The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.
:
:--
:
: Jon Hamilton
: hamilton@pobox.com
I'll commit a similar patch to 1.4.x to hopefully fix it there.
It looks like I copied the original bug to the new lock code when I
rewrote it during 1.5. I'm not sure when it was first introduced but
it looks like the code attempted to optimize the unblocking code by
breaking out of the loop early in certain situations, but it turns out
the optimization check it was doing was insufficient and it was breaking
out too early.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by joerg almost 19 years ago
On Fri, Aug 04, 2006 at 09:21:30AM -0700, Matthew Dillon wrote:
It looks like I copied the original bug to the new lock code when I
rewrote it during 1.5. I'm not sure when it was first introduced but
it looks like the code attempted to optimize the unblocking code by
breaking out of the loop early in certain situations, but it turns out
the optimization check it was doing was insufficient and it was breaking
out too early.
Braino. The check was the wrong way, e.g. it should check whether the
range is covering the given [start, end], not the other way around.
Joerg