Bug #1325
closedddb stops responding after resizing terminal window of vkernel
0%
Description
Greetings everyone.
When I resize the terminal window, while being at the ddb prompt inside a
vkernel, the ddb stops responding. That is, I hit keys but they don't appear at
the prompt. The vkernel is in a 'pause' state and the only way to shut it down
is by kill'ing -9 it. Mind that this only happens if I pass >1 as the number of
CPUs to emulate, so it must be some kind of SMP issue. The problem is 100%
reproducible, but I couldn't find a pattern of resizes to trigger. Still, 3-5
vertical resizes up/down usually suffice.
Cheers,
Stathis
Updated by Anonymous almost 16 years ago
: ... and the only way to shut it down is by kill'ing -9 it
josepht hinted that you can just Ctrl+C it as well.
Cheers,
Stathis
Updated by dillon almost 16 years ago
:New submission from Stathis Kamperis <ekamperi@gmail.com>:
:
:Greetings everyone.
:
:When I resize the terminal window, while being at the ddb prompt inside a
:vkernel, the ddb stops responding. That is, I hit keys but they don't appea=
:r at
:the prompt. The vkernel is in a 'pause' state and the only way to shut it d=
:own
:is by kill'ing -9 it. Mind that this only happens if I pass >1 as the numbe=
:r of
:CPUs to emulate, so it must be some kind of SMP issue. The problem is 100%
:reproducible, but I couldn't find a pattern of resizes to trigger. Still, 3=
:-5
:vertical resizes up/down usually suffice.
:
:Cheers,
:Stathis
So far I have not been able to reproduce the issue. Here's a patch
that just adds some debugging write()'s to the SIGWINCH path in the
vkernel.
When running normally you should see "WXYZ" appear on the console.
If you resize while you are in DDB you should only see "W" appear,
then when you 'cont' from DDB you should see the "XYZ".
If while in DDB the "XYZ" or even just the "X" occurs, then pending
interrupts are improperly being processed while in DDB mode. The
kernel will not block the actual SIGWINCH kernel (hence the "W"
appears), but the interrupt made pending by signalintr() should
not run until normal operation is resumed via 'cont'.
So the question is, what do you get when you get the freeze?
-Matt
Matthew Dillon
<dillon@backplane.com>
diff --git a/sys/platform/vkernel/platform/console.c b/sys/platform/vkernel/platform/console.c
index b4088ab..736325c 100644
--- a/sys/platform/vkernel/platform/console.c
+++ b/sys/platform/vkernel/platform/console.c@ -274,6 +274,7
@ vconssignal(int sig)
static void
vconswinchsig(int __unused sig)
{
+ write(2, "W", 1);
signalintr(3);
}
@ -282,6 +283,7
@ vconswinch_intr(void *arg __unused, void *frame __unused)
{
struct winsize newsize;
+ write(2, "X", 1);
if (vconsole != NULL && vconsole->cn_dev->si_tty != NULL) {
ioctl(0, TIOCGWINSZ, &newsize);
/*@ -292,7 +294,9
@ vconswinch_intr(void *arg __unused, void *frame __unused)
if (bcmp(&newsize, &vconsole->cn_dev->si_tty->t_winsize,
sizeof(newsize)) != 0) {
vconsole->cn_dev->si_tty->t_winsize = newsize;
+ write(2, "Y", 1);
pgsignal(vconsole->cn_dev->si_tty->t_pgrp, SIGWINCH, 1);
+ write(2, "Z", 1);
}
}
}
Updated by Anonymous almost 16 years ago
: When running normally you should see "WXYZ" appear on the console.
True, e.g.
Welcome to DragonFly!
vkernel# WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ
:If you resize while you are in DDB you should only see "W" appear,
True (but see below), e.g.
CPU1 stopping CPUs: 0x00000001
stopped
Stopped at 0x8213929: movb $0,0x835c0d4
db> WWWWWWWWWWWWWWWWWWW
:then when you 'cont' from DDB you should see the "XYZ".
Not really, e.g.
db> c
CPU1 restarting CPUs: 0x00000001
restarted
WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ
:If while in DDB the "XYZ" or even just the "X" occurs, then pending
:interrupts are improperly being processed while in DDB mode.
Not encountered.
Though, when I resize inside DDB and get the 'W', the problem manifests. I can't
type anything at the ddb prompt, whereas the 'W's show up fine.
Does anyone else see this ?
Cheers,
Stathis
Updated by dillon almost 16 years ago
:Stathis Kamperis <ekamperi@gmail.com> added the comment:
:...
:
:CPU1 stopping CPUs: 0x00000001
: stopped
:Stopped at 0x8213929: movb $0,0x835c0d4
:db> WWWWWWWWWWWWWWWWWWW
:
:
::then when you 'cont' from DDB you should see the "XYZ".
:Not really, e.g.
:
:db> c
:
:CPU1 restarting CPUs: 0x00000001
: restarted
:WXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZWXYZ
I get a single XYZ when I cont (without doing any additional
window resizing). You got 11 of them?
What is the exact command line you are using to start your
vkernel up?
:Though, when I resize inside DDB and get the 'W', the problem manifests. I =
:can't
:type anything at the ddb prompt, whereas the 'W's show up fine.
:
:Does anyone else see this ?
:
:Cheers,
:Stathis
Very odd. I wonder if the terminal program you are using has
something to do with it. Are you using an xterm or something
else? Try using an xterm if you aren't and see if it still
occurs.
-Matt
Updated by Anonymous almost 16 years ago
: Very odd. I wonder if the terminal program you are using has
: something to do with it. Are you using an xterm or something
: else? Try using an xterm if you aren't and see if it still
: occurs.
I'm using xterm.
Anyway, it's not that serious. I'm going to investigate it a bit myself. In the
meantime if anyone else can try to reproduce it, it would be great.
Cheers,
Stathis
Updated by josepht almost 16 years ago
I get the same freeze. It's looping in lwkt_send_ipiq3() inside this
while loop:
while (ip->ip_windex - ip->ip_rindex > MAXCPUFIFO / 4) {
KKASSERT(ip->ip_windex - ip->ip_rindex != MAXCPUFIFO - 1);
lwkt_process_ipiq();
}
I haven't yet tried Matt's debug patch. I don't get the freeze
immediately only after the 8th time that read() in vconsgetc() is
interrupted. I'm running in xterm.
Joe
Updated by qhwt+dfly almost 16 years ago
In fact, I can reproduce it without resizing-terminal tricks:
su root -c 'for i in `jot 8 1`; do pkill -WINCH kernel; sleep 0.1; done'
Cheers.
Updated by dillon almost 16 years ago
:> I get the same freeze. It's looping in lwkt_send_ipiq3() inside this
:> while loop:
:>
:> while (ip->ip_windex - ip->ip_rindex > MAXCPUFIFO / 4) {
:> KKASSERT;
:> lwkt_process_ipiq();
:> }
:>
:> I haven't yet tried Matt's debug patch. I don't get the freeze
:> immediately only after the 8th time that read() in vconsgetc() is
:> interrupted. I'm running in xterm.
:>
:> Joe
:
:In fact, I can reproduce it without resizing-terminal tricks:
: su root -c 'for i in `jot 8 1`; do pkill -WINCH kernel; sleep 0.1; done'
:
:Cheers.
Excellent! I added a print_backtrace() to that loop and reproduced
the lockup with the kill -WINCH / sleep! I found the problem!
lwkt_send_ipiq3(202,3,4340758b,0,434071e4) at 0x80c1b84
lwkt_send_ipiq3(40400000,8099f00,3,0,434003b0) at 0x80c1b84
sched_ithd(3,2828d154,4340720c,821de14,3) at 0x8099eef
signalintr(3,2,823bc1c,1,43407574) at 0x821858b
cons_unlock(1c,0,43407228,2,821ddfc) at 0x821de14
What is happening is that the SIGWINCH happens to hit a window where
no critical section is being held. That causes it to call
sched_ithd() instead of flagging the interrupt for future action.
sched_ithd() tries to send an IPI, but because the cpus have been
stopped cold by the debugger the IPI never gets sent. Once the
IPI function FIFO fills up it goes into that loop waiting for
the pending IPIs to be processed (which they never are because all
the other cpus are stopped).
The fix is very simple. I need only adjust the DDB code to enter
a critical section before it stops the cpus and exit it after it
restarts the cpus. I'll commit the fix tonight.
Excellent sleuthing!
-Matt
Matthew Dillon
<dillon@backplane.com>