Project

General

Profile

Actions

Bug #276

closed

Patch to try - Re: Sendmail rset command hangs socket on 1.6-Release

Added by sven over 17 years ago. Updated over 17 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:

Please try this patch and tell me if it works. I think we have an issue
when one process holds an exclusive lock while 2 or more processes are
trying to get a shared lock, or vise-versa.

-Matt

Index: kern_lockf.c ===================================================================
RCS file: /cvs/src/sys/kern/kern_lockf.c,v
retrieving revision 1.32
diff u -r1.32 kern_lockf.c
--
kern_lockf.c 25 Jul 2006 20:01:50 0000 1.32
++ kern_lockf.c 2 Aug 2006 17:23:56 -0000
@ -772,8 +772,10 @
TAILQ_REMOVE(&lock
>lf_blocked, range, lf_link);
range->lf_flags = 1;
wakeup(range);
#if 0
if (range->lf_start >= start && range->lf_end <= end)
break;
+#endif
}
}

I have applied the patch (and recompiled) and am letting the system run
full steam right now (including the milter, etc); the initial results
look promising as it has not exhibited the aberrant behavior as of yet.
I will post a followup after letting this run all night (assuming it
does so) or after it fails (which hopefully won't happen).

Sven

Actions #1

Updated by sven about 18 years ago

On Wed, 2006-08-02 at 17:41 -0400, Sven Willenberger wrote:

On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:

Please try this patch and tell me if it works. I think we have an issue
when one process holds an exclusive lock while 2 or more processes are
trying to get a shared lock, or vise-versa.

-Matt

Index: kern_lockf.c ===================================================================
RCS file: /cvs/src/sys/kern/kern_lockf.c,v
retrieving revision 1.32
diff u -r1.32 kern_lockf.c
--
kern_lockf.c 25 Jul 2006 20:01:50 0000 1.32
++ kern_lockf.c 2 Aug 2006 17:23:56 -0000
@ -772,8 +772,10 @
TAILQ_REMOVE(&lock
>lf_blocked, range, lf_link);
range->lf_flags = 1;
wakeup(range);
#if 0
if (range->lf_start >= start && range->lf_end <= end)
break;
+#endif
}
}

I have applied the patch (and recompiled) and am letting the system run
full steam right now (including the milter, etc); the initial results
look promising as it has not exhibited the aberrant behavior as of yet.
I will post a followup after letting this run all night (assuming it
does so) or after it fails (which hopefully won't happen).

Sven

As a followup, the server has been running without a hitch now for 18
hours so it would appear that the above patch has fixed the situation,
unless some other more rare situation/condition crops up that would
cause this lock.

Sven

Actions #2

Updated by dillon about 18 years ago

:> I have applied the patch (and recompiled) and am letting the system run
:> full steam right now (including the milter, etc); the initial results
:> look promising as it has not exhibited the aberrant behavior as of yet.
:> I will post a followup after letting this run all night (assuming it
:> does so) or after it fails (which hopefully won't happen).
:>
:> Sven
:>
:
:As a followup, the server has been running without a hitch now for 18
:hours so it would appear that the above patch has fixed the situation,
:unless some other more rare situation/condition crops up that would
:cause this lock.
:
:Sven

Ok, that's good to hear.  I'll get the patch committed to both HEAD and
REL.
This bug is serious enough to warrent rolling 1.6.1 next week, probably
Monday.
-Matt
Matthew Dillon
&lt;&gt;
Actions #3

Updated by qhwt+dfly about 18 years ago

On Thu, Aug 03, 2006 at 08:56:47AM -0700, Matthew Dillon wrote:

:> I have applied the patch (and recompiled) and am letting the system run
:> full steam right now (including the milter, etc); the initial results
:> look promising as it has not exhibited the aberrant behavior as of yet.
:> I will post a followup after letting this run all night (assuming it
:> does so) or after it fails (which hopefully won't happen).
:>
:> Sven
:>
:
:As a followup, the server has been running without a hitch now for 18
:hours so it would appear that the above patch has fixed the situation,
:unless some other more rare situation/condition crops up that would
:cause this lock.
:
:Sven

Ok, that's good to hear. I'll get the patch committed to both HEAD and
REL.

Does 1.4.x-RELEASE have this problem too?

Actions #4

Updated by dillon about 18 years ago

:
:Does 1.4.x-RELEASE have this problem too?

The lockf code is different in 1.4.  There is a similar test in lf_wakeup,
and it doesn't look right to me, but I don't know if the bug can be
triggered or not.
-Matt
Matthew Dillon
&lt;&gt;
Actions #5

Updated by hamilton about 18 years ago

Matthew Dillon <>, said on Thu Aug 03, 2006 [08:12:27 PM]:
} :
} :Does 1.4.x-RELEASE have this problem too?
}
} The lockf code is different in 1.4. There is a similar test in lf_wakeup,
} and it doesn't look right to me, but I don't know if the bug can be
} triggered or not.

The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.

--

Jon Hamilton 
Actions #6

Updated by dillon about 18 years ago

:...
:} and it doesn't look right to me, but I don't know if the bug can be
:} triggered or not.
:
:The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.
:
:--
:
: Jon Hamilton
:

I'll commit a similar patch to 1.4.x to hopefully fix it there.
It looks like I copied the original bug to the new lock code when I
rewrote it during 1.5. I'm not sure when it was first introduced but
it looks like the code attempted to optimize the unblocking code by
breaking out of the loop early in certain situations, but it turns out
the optimization check it was doing was insufficient and it was breaking
out too early.
-Matt
Matthew Dillon
&lt;&gt;
Actions #7

Updated by joerg about 18 years ago

On Fri, Aug 04, 2006 at 09:21:30AM -0700, Matthew Dillon wrote:

It looks like I copied the original bug to the new lock code when I
rewrote it during 1.5. I'm not sure when it was first introduced but
it looks like the code attempted to optimize the unblocking code by
breaking out of the loop early in certain situations, but it turns out
the optimization check it was doing was insufficient and it was breaking
out too early.

Braino. The check was the wrong way, e.g. it should check whether the
range is covering the given [start, end], not the other way around.

Joerg

Actions #8

Updated by justin about 18 years ago

Fixed in 1.6.1.

Actions

Also available in: Atom PDF