Bug #1024

panic: assertion: dd->uschedcp != lp in bsd4_resetpriority

Added by qhwt+dfly almost 6 years ago. Updated almost 6 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi.

I'm seeing this panic several times recently, only on an SMP kernel
running on VMware Fusion, with dual CPU support (not when the
number of active processors available to guest OS is 1). It occurs
under load, for instance when building kernel or world. I don't remember
when this started, so it may or may not be VMware's problem. I need to
try it on a real SMP hardware. Anyway, ...

(kgdb) bt
#0 dumpsys () at ./machine/thread.h:83
#1 0xc02bc881 in boot (howto=256)
at /dfsrc/current/sys/kern/kern_shutdown.c:375
#2 0xc02bcb44 in panic (fmt=0xc0507fb7 "assertion: %s in %s")
at /dfsrc/current/sys/kern/kern_shutdown.c:800
#3 0xc02c2515 in bsd4_resetpriority (lp=0xcadf8900)
at /dfsrc/current/sys/kern/usched_bsd4.c:792
#4 0xc02c26f8 in bsd4_recalculate_estcpu (lp=0xcadf8900)
at /dfsrc/current/sys/kern/usched_bsd4.c:701
#5 0xc02ca0ef in schedcpu_stats (p=0xcae6f858, data=0x0)
at /dfsrc/current/sys/kern/kern_synch.c:212
#6 0xc02b68e6 in allproc_scan (callback=0xc02ca099 <schedcpu_stats>, data=0x0)
at /dfsrc/current/sys/kern/kern_proc.c:533
#7 0xc02c9d52 in schedcpu (arg=0x0)
at /dfsrc/current/sys/kern/kern_synch.c:186
#8 0xc02cce3a in softclock_handler (arg=0xc060c7e0)
at /dfsrc/current/sys/kern/kern_timeout.c:308
#9 0xc02c450b in lwkt_deschedule_self (td=Cannot access memory at address 0x8
)
at /dfsrc/current/sys/kern/lwkt_thread.c:223

At first I thought that other CPU has just modified after this CPU
has unlocked bsd4_spin but before KKASSERT(), so I tried to defer
spin_unlock_wr() as done in bsd4_setrunqueue():

%%%
diff --git a/sys/kern/usched_bsd4.c b/sys/kern/usched_bsd4.c
index b934e3d..e3478a0 100644
--- a/sys/kern/usched_bsd4.c
+++ b/sys/kern/usched_bsd4.c
@@ -779,7 +779,6 @@ bsd4_resetpriority(struct lwp *lp)
lp->lwp_priority = newpriority;
reschedcpu = -1;
}
- spin_unlock_wr(&bsd4_spin);

/*
* Determine if we need to reschedule the target cpu. This only
@@ -789,9 +788,14 @@ bsd4_resetpriority(struct lwp *lp)
*/
if (reschedcpu >= 0) {
dd = &bsd4_pcpu[reschedcpu];
- KKASSERT(dd->uschedcp != lp);
+ if (dd->uschedcp == lp) {
+ kprintf("%p(%d): dd->uschedcp=lp=%p\n",
+ curthread, mycpu->gd_cpuid, lp);
+ goto out;
+ }
if ((dd->upri & ~PRIMASK) > (lp->lwp_priority & ~PRIMASK)) {
dd->upri = lp->lwp_priority;
+ spin_unlock_wr(&bsd4_spin);
#ifdef SMP
if (reschedcpu == mycpu->gd_cpuid) {
need_user_resched();
@@ -802,8 +806,12 @@ bsd4_resetpriority(struct lwp *lp)
#else
need_user_resched();
#endif
+ crit_exit();
+ return;
}
}
+out:
+ spin_unlock_wr(&bsd4_spin);
crit_exit();
}

%%%

This seemed to cease the assertion, but adding debugging stuff
(like kprintf() or mycpu) also seemed to avoid the assertion
(or made it diffcult to reproduce), so I'm not 100% sure this
is enough, but other places manipulating bsd4_pcpu[] include:

bsd4_acquire_curproc:252: only when dd->uschedcp == NULL
bsd4_release_curproc:321: only when dd->uschedcp == lp
bsd4_setrunqueue:463: only when gd == mycpu
bsd4_schedulerclock:581: only modifier of rrcount

which don't seem to need spinlocks (maybe, correct me if I'm wrong).

Cheers.

History

#1 Updated by dillon almost 6 years ago

:Hi.
:
:I'm seeing this panic several times recently, only on an SMP kernel
:running on VMware Fusion, with dual CPU support (not when the
:number of active processors available to guest OS is 1). It occurs
:under load, for instance when building kernel or world. I don't remember
:when this started, so it may or may not be VMware's problem. I need to
:try it on a real SMP hardware. Anyway, ...
:...
:
:At first I thought that other CPU has just modified after this CPU
:has unlocked bsd4_spin but before KKASSERT(), so I tried to defer
:spin_unlock_wr() as done in bsd4_setrunqueue():
:...
:This seemed to cease the assertion, but adding debugging stuff
:(like kprintf() or mycpu) also seemed to avoid the assertion
:(or made it diffcult to reproduce), so I'm not 100% sure this
:is enough, but other places manipulating bsd4_pcpu[] include:
:
: bsd4_acquire_curproc:252: only when dd->uschedcp == NULL
: bsd4_release_curproc:321: only when dd->uschedcp == lp
: bsd4_setrunqueue:463: only when gd == mycpu
: bsd4_schedulerclock:581: only modifier of rrcount
:
:which don't seem to need spinlocks (maybe, correct me if I'm wrong).
:
:Cheers.

I think you basically found the bug, and it does make sense that
kprintf's could disrupt the race that causes it because what you found
is a true 'SMP' race between two cpus.

Almost anything related to the globals (around line 146) is supposed to
be spin-locked. Numerous elements related to the per-cpu data
(bsd4_pcpu[]), when operating on the current cpu, should not require
spin locking. That is the idea anyway.

Setting and NULLing dd->uschedcp in the per-cpu data, for the current
cpu, is not intended to need a spin lock. Similarly the per-cpu
rrcount field is a heuristic only used by the current cpu and does not
need a spin lock.

Scanning the global scheduler queue is intended to require a spinlock,
since all the cpus are 'pulling' from the same global queue.

The code you found the bug appears to be bsd4_resetpriority() when
called on a 'random' lwp instead of the 'current' lwp on the calling
cpu. That is, when the softclock does the allproc_scan(). That
scan only occurs on one cpu, with the MP lock held (I think), but can
race against manipulation of the per-cpu scheduler data (dd->*) on other
cpus. Such manipulate can occur on other cpus without the MP lock held
and without the spinlock held.

Because the LWP in the failing path (the one doing the allproc_scan) is
potentially owned by some other cpu, that KKASSERT() can race another
cpu and cause a panic. The bug is actually the presence of the KKASSERT
and not the manipulation of the spin lock.

Moving the spinlock thus might not completely solve the problem.

I think all that needs to be done here is to remove the KKASSERT(). The
worst that happens is that a cpu gets an extra need_resched IPI (which
does no harm), or dd->upri winds up with the wrong value, which should
also do no real harm. I believe the KKASSERT itself is wrong. Instead,
augment the code comment above that section to indicate that the
dd_uschedcp variable can be ripped out from under the code and
cause a spurious need_user_resched() on the target cpu and caused
dd->upri to be wrong for a short period of time.

-Matt
Matthew Dillon
<>

#2 Updated by qhwt+dfly almost 6 years ago

On Thu, May 29, 2008 at 10:27:08AM -0700, Matthew Dillon wrote:
...
> The code you found the bug appears to be bsd4_resetpriority() when
> called on a 'random' lwp instead of the 'current' lwp on the calling
> cpu. That is, when the softclock does the allproc_scan(). That
> scan only occurs on one cpu, with the MP lock held (I think), but can
> race against manipulation of the per-cpu scheduler data (dd->*) on other
> cpus. Such manipulate can occur on other cpus without the MP lock held
> and without the spinlock held.
>
> Because the LWP in the failing path (the one doing the allproc_scan) is
> potentially owned by some other cpu, that KKASSERT() can race another
> cpu and cause a panic. The bug is actually the presence of the KKASSERT
> and not the manipulation of the spin lock.
>
> Moving the spinlock thus might not completely solve the problem.
>
> I think all that needs to be done here is to remove the KKASSERT(). The
> worst that happens is that a cpu gets an extra need_resched IPI (which
> does no harm), or dd->upri winds up with the wrong value, which should
> also do no real harm. I believe the KKASSERT itself is wrong. Instead,
> augment the code comment above that section to indicate that the
> dd_uschedcp variable can be ripped out from under the code and
> cause a spurious need_user_resched() on the target cpu and caused
> dd->upri to be wrong for a short period of time.

... assuming that I'm expected to augment the comment, does this look OK?

* Remove KKASSERT() from the code block where not all callers' CPU own
the LWP, occasionally leading to a panic because of race between CPUs.

--- usched_bsd4.c.orig 2008-05-15 16:15:08.000000000 +0900
+++ usched_bsd4.c 2008-06-18 17:54:07.000000000 +0900
@@ -786,10 +786,16 @@
* occurs if the LWP is already on a scheduler queue, which means
* that idle cpu notification has already occured. At most we
* need only issue a need_user_resched() on the appropriate cpu.
+ *
+ * The LWP may be owned by a CPU different from the current one,
+ * in which case dd->uschedcp may be modified without an MP lock
+ * or a spinlock held. The worst that happens is that the code
+ * below causes a spurious need_user_resched() on the target CPU
+ * and dd->pri to be wrong for a short period of time, both of
+ * which are harmless.
*/
if (reschedcpu >= 0) {
dd = &bsd4_pcpu[reschedcpu];
- KKASSERT(dd->uschedcp != lp);
if ((dd->upri & ~PRIMASK) > (lp->lwp_priority & ~PRIMASK)) {
dd->upri = lp->lwp_priority;
#ifdef SMP

Thanks.

#3 Updated by dillon almost 6 years ago

:... assuming that I'm expected to augment the comment, does this look OK?
:

Looks fine.

-Matt

#4 Updated by qhwt+dfly almost 6 years ago

commmitted.

Also available in: Atom PDF