Bug #2035

livelock in lwkt_setcpu_remote()

Added by masterblaster about 3 years ago. Updated over 2 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi all,

i'm currently porting conky (a nice system monitor utility) to dfly, and
when running the binary i'm constantly incurring in following livelock:

lwkt_setcpu_remote: td->td_flags 00800621
lwkt_setcpu_remote: td->td_flags 00800621
lwkt_setcpu_remote: td->td_flags 00800621
...

in /sys/kern/lwkt_thread.c lwkt_setcpu_remote() says

while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK)) {
#ifdef SMP
lwkt_process_ipiq();
#endif
cpu_lfence();
cpu_pause();
if (--retry == 0) {
kprintf("lwkt_setcpu_remote: td->td_flags %08x\n",
td->td_flags);
retry = 10000000;
}
}

and it doesn't get out of 'while' anymore.

I'm runnning v2.9.1.890.g8e5e11-DEVELOPMENT on an atom N510 board (2
cores, 2 logical cpus each)

I reproduced the problem with a later v2.9.1.951.g6eaa5-DEVELOPMENT
(pulled some hours ago) on VitualBox with SMP kernel and 2 cpus
configured (it's just a matter of seconds).

The problem appears with VirtualBox even with non SMP kernel and 1 cpu
configured, so maybe isn't strictly SMP related.

A console stack trace from VirtualBox simulation and making '#if 1' out
of '#if 0' for debugging porpuses according to commit
cfaeae2ac9bf2ac78367308f7070d4a11c1680d2 (plus td->td_release %p printed
too)

cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
lwkt_setcpu_remote: td->td_release 0, td->td_flags 00000621
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
lwkt_setcpu_remote: td->td_release 0, td->td_flags 00000621
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 1 ipiq maxed cscount 1 spin 0
pmap_interlock_wait lwkt_setcpu_remote
cpu 0 ipiq maxed cscount 0 spin 0
panic: cpu 0 ipiq maxed
cpuid = 0
Trace beginning at frame 0xcc17ebb8
panic(ffffffff,0,c045dc7f,cc17ebf0,c9825818) at panic+0x1a2
panic(c045dc7f,0,0,0,cc17ec10) at panic+0x1a2
lwkt_process_ipiq_core(1,ff800110,c16bc000,cc019100,0) at
lwkt_process_ipiq_core+0xda
lwkt_process_ipiq(cc0ac9d8,1,cc218b54,c03d86c0,cc019100) at
lwkt_process_ipiq+0x7e
pmap_interlock_wait(cc019100) at pmap_interlock_wait+0x37
cpu_heavy_restore(ff809000,c01e8b3b,cc0ac9d8,0,0) at
cpu_heavy_restore+0x28
lwkt_setcpu_self(ff809000,0,d0c,0,0) at lwkt_setcpu_self+0xb9
sysctl_kern_proc(c04add00,0,0,cc218bf4,cc218bf4) at sysctl_kern_proc+0x294
sysctl_root(cc218bf4,cc218c28,4,cc0ac9d8,0) at sysctl_root+0x127
userland_sysctl(cc218c7c,3,0,bf6fbf48,0) at userland_sysctl+0x128
sys___sysctl(cc218cf0,cc218d00,18,cc0ac9d8,cc0ac9d8) at sys___sysctl+0x8f
syscall2(cc218d40) at syscall2+0x272
Xint0x80_syscall() at Xint0x80_syscall+0x36
Debugger("panic")

CPU0 stopping CPUs: 0x00000002
stopped
Stopped at Debugger+0x3f: movb $0,in_Debugger.7216

:(

unnamed (1.27 KB) t_dfbsd, 03/24/2011 09:56 PM

History

#1 Updated by dillon about 3 years ago

:Hi all,
:
:i'm currently porting conky (a nice system monitor utility) to dfly, and
:when running the binary i'm constantly incurring in following livelock:
:
:lwkt_setcpu_remote: td->td_flags 00800621
:lwkt_setcpu_remote: td->td_flags 00800621
:lwkt_setcpu_remote: td->td_flags 00800621
:...

Shoot, I had hoped those were fixed. At this point I think I'm going
to have to rewrite the mechanism entirely by adding some assembly to
deal with the case in the thread switch code.

-Matt

#2 Updated by steve about 3 years ago

On Thu, 24 Mar 2011 09:45:01 -0700 (PDT)
Matthew Dillon <> wrote:

> Shoot, I had hoped those were fixed. At this point I think I'm going
> to have to rewrite the mechanism entirely by adding some assembly to
> deal with the case in the thread switch code.

Is there any likelihood that this is related to my LWKT_WAIT_IPIQ
panic http://bugs.dragonflybsd.org/issue2004 ?

#3 Updated by masterblaster about 3 years ago

Hi Steve,

i get too on boot sporadically 'M_NOWAIT kmalloc failed, trying
M_WAITOK', but just under VirtualBox environment. Didn't mentioned it,
as i saw no link with livelock problem.

On 3/24/2011, "Steve O'Hara-Smith" <> wrote:

>On Thu, 24 Mar 2011 09:45:01 -0700 (PDT)
>Matthew Dillon <> wrote:
>
>
>> Shoot, I had hoped those were fixed. At this point I think I'm going
>> to have to rewrite the mechanism entirely by adding some assembly to
>> deal with the case in the thread switch code.
>
> Is there any likelihood that this is related to my LWKT_WAIT_IPIQ
>panic http://bugs.dragonflybsd.org/issue2004 ?
>
>--
>Steve O'Hara-Smith | Directable Mirror Arrays
>C:>WIN | A better way to focus the sun
>The computer obeys and wins. | licences available see
>You lose and Bill collects. | http://www.sohara.org/

#4 Updated by dillon about 3 years ago

I have pushed work to master that should hopefully fix the problem,
please test!

-Matt
Matthew Dillon
<>

#5 Updated by dillon about 3 years ago

: Is there any likelihood that this is related to my LWKT_WAIT_IPIQ
:panic http://bugs.dragonflybsd.org/issue2004 ?
:
:--
:Steve O'Hara-Smith | Directable Mirror Arrays

Possibly, but that code is another pool of mud that could have its
own issues still.

-Matt
Matthew Dillon
<>

#6 Updated by dillon about 3 years ago

:Hi Steve,
:
:i get too on boot sporadically 'M_NOWAIT kmalloc failed, trying
:M_WAITOK', but just under VirtualBox environment. Didn't mentioned it,
:as i saw no link with livelock problem.

This is unrelated to the livelocks and probably harmless. It's another
area (the tty clist buffer code) which needs revamping to remove the
memory allocation requirements.

-Matt
Matthew Dillon
<>

#7 Updated by t_dfbsd about 3 years ago

I've been seeing this too but didn't report it because it didn't seem to be
hurting anything.

Tim

On Thu, Mar 24, 2011 at 2:49 PM, Matthew Dillon <
> wrote:

> :Hi Steve,
> :
> :i get too on boot sporadically 'M_NOWAIT kmalloc failed, trying
> :M_WAITOK', but just under VirtualBox environment. Didn't mentioned it,
> :as i saw no link with livelock problem.
>
> This is unrelated to the livelocks and probably harmless. It's another
> area (the tty clist buffer code) which needs revamping to remove the
> memory allocation requirements.
>
> -Matt
> Matthew Dillon
> <>
>

#8 Updated by masterblaster about 3 years ago

Hi Matt,

Thanks for the fast fix! Unfortunately following hunk breaks buildworld:

index 91ccb23..180dadf 100644
--- a/sys/vm/vm_map.h
+++ b/sys/vm/vm_map.h
@@ -75,6 +75,9 @@
#ifndef _SYS_TYPES_H_
#include <sys/types.h>
#endif
+#ifndef _SYS_KERNEL_H_
+#include <sys/kernel.h>
+#endif
#ifndef _SYS_TREE_H_
#include <sys/tree.h>
#endif

as in sys/kernel.h:

#ifndef _KERNEL
#error "This file should not be included by userland programs."
#else

I'm not familiar with build infrastucture, so i have to stop here.

On Thu, 24 Mar 2011 14:45:36 -0700 (PDT)
Matthew Dillon <> wrote:

> I have pushed work to master that should hopefully fix the problem,
> please test!
>
> -Matt
> Matthew Dillon
> <>

#9 Updated by dillon about 3 years ago

:Hi Matt,
:
:Thanks for the fast fix! Unfortunately following hunk breaks buildworld:

All right, I think I've fixed it in master. It looks like one of the
userland libraries also wanted to use the inlines in vm_map.h.

-Matt

#10 Updated by steve about 3 years ago

On Thu, 24 Mar 2011 14:47:37 -0700 (PDT)
Matthew Dillon <> wrote:

>
> : Is there any likelihood that this is related to my LWKT_WAIT_IPIQ
> :panic http://bugs.dragonflybsd.org/issue2004 ?
> :
> :--
> :Steve O'Hara-Smith | Directable Mirror Arrays
>
> Possibly, but that code is another pool of mud that could have its
> own issues still.

Now confirmed not to be related, I'm still getting the panics and
the ssh freezes.

#11 Updated by masterblaster about 3 years ago

Hi Matt,

running stable with VKERNEL :) (please note uptime below)
I will make tomorrow evening more extensive 'real world' testing with
atom D510 mobo.

Again, many thanks for the quick fix!!

And so looks conky like with ncurses interface (some values are still not
correct...)

DragonFly v2.9.1.974.g7dc51-DEVELOPMENT on i386

Uptime: 7h 24m 47s
RAM Usage: 34.6MiB/61.5MiB - 56% ######____
Swap Usage: 0B/0B - No swap% __________
CPU: 52% #####_____
CPU 0: 52% #####_____
CPU 1: 100% #####_____
Processes: 19 Running: 0

File systems:
/ 272MiB/1.97GiB #_________
Networking:
Up: 0B - Down: 0B

Name PID CPU% MEM%
conky 990 0.54 3.94
sendmail 653 0.00 3.37
sendmail 657 0.00 3.34
csh 735 0.00 2.89

On 3/25/2011, "Matthew Dillon" <> wrote:

>:Hi Matt,
>:
>:Thanks for the fast fix! Unfortunately following hunk breaks buildworld:
>
> All right, I think I've fixed it in master. It looks like one of the
> userland libraries also wanted to use the inlines in vm_map.h.
>
> -Matt

#12 Updated by masterblaster about 3 years ago

Hi Matt,

application causing livelock ran stable the whole night and still on :)
It's solved (for me), many thanks again!

On Fri, 25 Mar 2011 21:45:26 +0000
"Matthew Dillon \(via DragonFly issue tracker\)" <> wrote:

>
> Matthew Dillon <> added the comment:
>
> :Hi Matt,
> :
> :Thanks for the fast fix! Unfortunately following hunk breaks buildworld:
>
> All right, I think I've fixed it in master. It looks like one of the
> userland libraries also wanted to use the inlines in vm_map.h.
>
> -Matt
>
> _____________________________________________________
> DragonFly issue tracker <>
> <http://bugs.dragonflybsd.org/issue2035>
> _____________________________________________________

Also available in: Atom PDF