Bug #276

Patch to try - Re: Sendmail rset command hangs socket on 1.6-Release

Added by sven over 8 years ago. Updated about 8 years ago.

Status:ClosedStart date:
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:
> Please try this patch and tell me if it works. I think we have an issue
> when one process holds an exclusive lock while 2 or more processes are
> trying to get a shared lock, or vise-versa.
>
> -Matt
>
> Index: kern_lockf.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_lockf.c,v
> retrieving revision 1.32
> diff -u -r1.32 kern_lockf.c
> --- kern_lockf.c 25 Jul 2006 20:01:50 -0000 1.32
> +++ kern_lockf.c 2 Aug 2006 17:23:56 -0000
> @@ -772,8 +772,10 @@
> TAILQ_REMOVE(&lock->lf_blocked, range, lf_link);
> range->lf_flags = 1;
> wakeup(range);
> +#if 0
> if (range->lf_start >= start && range->lf_end <= end)
> break;
> +#endif
> }
> }
>

I have applied the patch (and recompiled) and am letting the system run
full steam right now (including the milter, etc); the initial results
look promising as it has not exhibited the aberrant behavior as of yet.
I will post a followup after letting this run all night (assuming it
does so) or after it fails (which hopefully won't happen).

Sven

History

#1 Updated by sven over 8 years ago

On Wed, 2006-08-02 at 17:41 -0400, Sven Willenberger wrote:
> On Wed, 2006-08-02 at 10:25 -0700, Matthew Dillon wrote:
> > Please try this patch and tell me if it works. I think we have an issue
> > when one process holds an exclusive lock while 2 or more processes are
> > trying to get a shared lock, or vise-versa.
> >
> > -Matt
> >
> > Index: kern_lockf.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_lockf.c,v
> > retrieving revision 1.32
> > diff -u -r1.32 kern_lockf.c
> > --- kern_lockf.c 25 Jul 2006 20:01:50 -0000 1.32
> > +++ kern_lockf.c 2 Aug 2006 17:23:56 -0000
> > @@ -772,8 +772,10 @@
> > TAILQ_REMOVE(&lock->lf_blocked, range, lf_link);
> > range->lf_flags = 1;
> > wakeup(range);
> > +#if 0
> > if (range->lf_start >= start && range->lf_end <= end)
> > break;
> > +#endif
> > }
> > }
> >
>
> I have applied the patch (and recompiled) and am letting the system run
> full steam right now (including the milter, etc); the initial results
> look promising as it has not exhibited the aberrant behavior as of yet.
> I will post a followup after letting this run all night (assuming it
> does so) or after it fails (which hopefully won't happen).
>
> Sven
>

As a followup, the server has been running without a hitch now for 18
hours so it would appear that the above patch has fixed the situation,
unless some other more rare situation/condition crops up that would
cause this lock.

Sven

#2 Updated by dillon over 8 years ago

:> I have applied the patch (and recompiled) and am letting the system run
:> full steam right now (including the milter, etc); the initial results
:> look promising as it has not exhibited the aberrant behavior as of yet.
:> I will post a followup after letting this run all night (assuming it
:> does so) or after it fails (which hopefully won't happen).
:>
:> Sven
:>
:
:As a followup, the server has been running without a hitch now for 18
:hours so it would appear that the above patch has fixed the situation,
:unless some other more rare situation/condition crops up that would
:cause this lock.
:
:Sven

Ok, that's good to hear. I'll get the patch committed to both HEAD and
REL.

This bug is serious enough to warrent rolling 1.6.1 next week, probably
Monday.

-Matt
Matthew Dillon
<>

#3 Updated by qhwt+dfly over 8 years ago

On Thu, Aug 03, 2006 at 08:56:47AM -0700, Matthew Dillon wrote:
>
> :> I have applied the patch (and recompiled) and am letting the system run
> :> full steam right now (including the milter, etc); the initial results
> :> look promising as it has not exhibited the aberrant behavior as of yet.
> :> I will post a followup after letting this run all night (assuming it
> :> does so) or after it fails (which hopefully won't happen).
> :>
> :> Sven
> :>
> :
> :As a followup, the server has been running without a hitch now for 18
> :hours so it would appear that the above patch has fixed the situation,
> :unless some other more rare situation/condition crops up that would
> :cause this lock.
> :
> :Sven
>
> Ok, that's good to hear. I'll get the patch committed to both HEAD and
> REL.

Does 1.4.x-RELEASE have this problem too?

#4 Updated by dillon over 8 years ago

:
:Does 1.4.x-RELEASE have this problem too?

The lockf code is different in 1.4. There is a similar test in lf_wakeup,
and it doesn't look right to me, but I don't know if the bug can be
triggered or not.

-Matt
Matthew Dillon
<>

#5 Updated by hamilton over 8 years ago

Matthew Dillon <>, said on Thu Aug 03, 2006 [08:12:27 PM]:
} :
} :Does 1.4.x-RELEASE have this problem too?
}
} The lockf code is different in 1.4. There is a similar test in lf_wakeup,
} and it doesn't look right to me, but I don't know if the bug can be
} triggered or not.

The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.

--

Jon Hamilton

#6 Updated by dillon over 8 years ago

:...
:} and it doesn't look right to me, but I don't know if the bug can be
:} triggered or not.
:
:The postfix problem I had was present on 1.4.x as well as 1.6 and 1.7.
:
:--
:
: Jon Hamilton
:

I'll commit a similar patch to 1.4.x to hopefully fix it there.

It looks like I copied the original bug to the new lock code when I
rewrote it during 1.5. I'm not sure when it was first introduced but
it looks like the code attempted to optimize the unblocking code by
breaking out of the loop early in certain situations, but it turns out
the optimization check it was doing was insufficient and it was breaking
out too early.

-Matt
Matthew Dillon
<>

#7 Updated by joerg over 8 years ago

On Fri, Aug 04, 2006 at 09:21:30AM -0700, Matthew Dillon wrote:
> It looks like I copied the original bug to the new lock code when I
> rewrote it during 1.5. I'm not sure when it was first introduced but
> it looks like the code attempted to optimize the unblocking code by
> breaking out of the loop early in certain situations, but it turns out
> the optimization check it was doing was insufficient and it was breaking
> out too early.

Braino. The check was the wrong way, e.g. it should check whether the
range is covering the given [start, end], not the other way around.

Joerg

#8 Updated by justin over 8 years ago

Fixed in 1.6.1.

Also available in: Atom PDF