Bug #1325

ddb stops responding after resizing terminal window of vkernel

Added by Beket over 5 years ago. Updated over 5 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Greetings everyone.

When I resize the terminal window, while being at the ddb prompt inside a
vkernel, the ddb stops responding. That is, I hit keys but they don't appear at
the prompt. The vkernel is in a 'pause' state and the only way to shut it down
is by kill'ing -9 it. Mind that this only happens if I pass >1 as the number of
CPUs to emulate, so it must be some kind of SMP issue. The problem is 100%
reproducible, but I couldn't find a pattern of resizes to trigger. Still, 3-5
vertical resizes up/down usually suffice.

Cheers,
Stathis

History

#1 Updated by Beket over 5 years ago

: ... and the only way to shut it down is by kill'ing -9 it
josepht hinted that you can just Ctrl+C it as well.

Cheers,
Stathis

#2 Updated by dillon over 5 years ago

:New submission from Stathis Kamperis <>:
:
:Greetings everyone.
:
:When I resize the terminal window, while being at the ddb prompt inside a
:vkernel, the ddb stops responding. That is, I hit keys but they don't appea=
:r at
:the prompt. The vkernel is in a 'pause' state and the only way to shut it d=
:own
:is by kill'ing -9 it. Mind that this only happens if I pass >1 as the numbe=
:r of
:CPUs to emulate, so it must be some kind of SMP issue. The problem is 100%
:reproducible, but I couldn't find a pattern of resizes to trigger. Still, 3=
:-5
:vertical resizes up/down usually suffice.
:
:Cheers,
:Stathis

So far I have not been able to reproduce the issue. Here's a patch
that just adds some debugging write()'s to the SIGWINCH path in the
vkernel.

When running normally you should see "WXYZ" appear on the console.
If you resize while you are in DDB you should only see "W" appear,
then when you 'cont' from DDB you should see the "XYZ".

If while in DDB the "XYZ" or even just the "X" occurs, then pending
interrupts are improperly being processed while in DDB mode. The
kernel will not block the actual SIGWINCH kernel (hence the "W"
appears), but the interrupt made pending by signalintr() should
not run until normal operation is resumed via 'cont'.

So the question is, what do you get when you get the freeze?

-Matt
Matthew Dillon
<>

diff --git a/sys/platform/vkernel/platform/console.c b/sys/platform/vkernel/platform/console.c
index b4088ab..736325c 100644
--- a/sys/platform/vkernel/platform/console.c
+++ b/sys/platform/vkernel/platform/console.c
@@ -274,6 +274,7 @@ vconssignal(int sig)
static void
vconswinchsig(int __unused sig)
{
+ write(2, "W", 1);
signalintr(3);
}

@@ -282,6 +283,7 @@ vconswinch_intr(void *arg __unused, void *frame __unused)
{
struct winsize newsize;

+ write(2, "X", 1);
if (vconsole != NULL && vconsole->cn_dev->si_tty != NULL) {
ioctl(0, TIOCGWINSZ, &newsize);
/*
@@ -292,7 +294,9 @@ vconswinch_intr(void *arg __unused, void *frame __unused)
if (bcmp(&newsize, &vconsole->cn_dev->si_tty->t_winsize,
sizeof(newsize)) != 0) {
vconsole->cn_dev->si_tty->t_winsize = newsize;
+ write(2, "Y", 1);
pgsignal(vconsole->cn_dev->si_tty->t_pgrp, SIGWINCH, 1);
+ write(2, "Z", 1);
}
}
}

#3 Updated by Beket over 5 years ago

: When running normally you should see "WXYZ" appear on the console.
True, e.g.

Welcome to DragonFly!

vkernel# WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ

:If you resize while you are in DDB you should only see "W" appear,
True (but see below), e.g.

CPU1 stopping CPUs: 0x00000001
stopped
Stopped at 0x8213929: movb $0,0x835c0d4
db> WWWWWWWWWWWWWWWWWWW

:then when you 'cont' from DDB you should see the "XYZ".
Not really, e.g.

db> c

CPU1 restarting CPUs: 0x00000001
restarted
WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ

:If while in DDB the "XYZ" or even just the "X" occurs, then pending
:interrupts are improperly being processed while in DDB mode.
Not encountered.

Though, when I resize inside DDB and get the 'W', the problem manifests. I can't
type anything at the ddb prompt, whereas the 'W's show up fine.

Does anyone else see this ?

Cheers,
Stathis

#4 Updated by dillon over 5 years ago

:Stathis Kamperis <> added the comment:
:...
:
:CPU1 stopping CPUs: 0x00000001
: stopped
:Stopped at 0x8213929: movb $0,0x835c0d4
:db> WWWWWWWWWWWWWWWWWWW
:
:
::then when you 'cont' from DDB you should see the "XYZ".
:Not really, e.g.
:
:db> c
:
:CPU1 restarting CPUs: 0x00000001
: restarted
:WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ

I get a single XYZ when I cont (without doing any additional
window resizing). You got 11 of them?

What is the exact command line you are using to start your
vkernel up?

:Though, when I resize inside DDB and get the 'W', the problem manifests. I =
:can't
:type anything at the ddb prompt, whereas the 'W's show up fine.
:
:Does anyone else see this ?
:
:Cheers,
:Stathis

Very odd. I wonder if the terminal program you are using has
something to do with it. Are you using an xterm or something
else? Try using an xterm if you aren't and see if it still
occurs.

-Matt

#5 Updated by Beket over 5 years ago

: Very odd. I wonder if the terminal program you are using has
: something to do with it. Are you using an xterm or something
: else? Try using an xterm if you aren't and see if it still
: occurs.
I'm using xterm.

Anyway, it's not that serious. I'm going to investigate it a bit myself. In the
meantime if anyone else can try to reproduce it, it would be great.

Cheers,
Stathis

#6 Updated by josepht over 5 years ago

I get the same freeze. It's looping in lwkt_send_ipiq3() inside this
while loop:

while (ip->ip_windex - ip->ip_rindex > MAXCPUFIFO / 4) {
KKASSERT(ip->ip_windex - ip->ip_rindex != MAXCPUFIFO - 1);
lwkt_process_ipiq();
}

I haven't yet tried Matt's debug patch. I don't get the freeze
immediately only after the 8th time that read() in vconsgetc() is
interrupted. I'm running in xterm.

Joe

#7 Updated by qhwt+dfly over 5 years ago

In fact, I can reproduce it without resizing-terminal tricks:
su root -c 'for i in `jot 8 1`; do pkill -WINCH kernel; sleep 0.1; done'

Cheers.

#8 Updated by dillon over 5 years ago

:> I get the same freeze. It's looping in lwkt_send_ipiq3() inside this
:> while loop:
:>
:> while (ip->ip_windex - ip->ip_rindex > MAXCPUFIFO / 4) {
:> KKASSERT(ip->ip_windex - ip->ip_rindex != MAXCPUFIFO - 1);
:> lwkt_process_ipiq();
:> }
:>
:> I haven't yet tried Matt's debug patch. I don't get the freeze
:> immediately only after the 8th time that read() in vconsgetc() is
:> interrupted. I'm running in xterm.
:>
:> Joe
:
:In fact, I can reproduce it without resizing-terminal tricks:
: su root -c 'for i in `jot 8 1`; do pkill -WINCH kernel; sleep 0.1; done'
:
:Cheers.

Excellent! I added a print_backtrace() to that loop and reproduced
the lockup with the kill -WINCH / sleep! I found the problem!

lwkt_send_ipiq3(202,3,4340758b,0,434071e4) at 0x80c1b84
lwkt_send_ipiq3(40400000,8099f00,3,0,434003b0) at 0x80c1b84
sched_ithd(3,2828d154,4340720c,821de14,3) at 0x8099eef
signalintr(3,2,823bc1c,1,43407574) at 0x821858b
cons_unlock(1c,0,43407228,2,821ddfc) at 0x821de14

What is happening is that the SIGWINCH happens to hit a window where
no critical section is being held. That causes it to call
sched_ithd() instead of flagging the interrupt for future action.
sched_ithd() tries to send an IPI, but because the cpus have been
stopped cold by the debugger the IPI never gets sent. Once the
IPI function FIFO fills up it goes into that loop waiting for
the pending IPIs to be processed (which they never are because all
the other cpus are stopped).

The fix is very simple. I need only adjust the DDB code to enter
a critical section before it stops the cpus and exit it after it
restarts the cpus. I'll commit the fix tonight.

Excellent sleuthing!

-Matt
Matthew Dillon
<>

Also available in: Atom PDF