Bug #1322
closedpanic with high signal load
0%
Description
I managed to panic the kernel while doing some testing using signals.
The application was doing the following:
fork child
child sets up signal handler for SIGHUP then loops forever calling
pause().
parent set up signal handler for SIGHUP then parent loops count number
of times sending a SIGHUP to the child then calls pause().
child's SIGHUP handler just sends a SIGHUP to the parent.
parent's SIGHUP handler calculates the round-trip time for the signal.
This appears to work fine for count < 1000 or so. I tried an
iteration where count = 5000 and panic'ed the kernel. I was unable to
get the panic message from the serial console but was able to get the
following trace from DDB:
db> trace
Debugger(c03d444f) at Debugger+0x34
panic(c03c8398,c040a210,c03c7238,d2684d58,2) at panic+0x9f
userret(6,0,0,d2684d58,c041f11c) at userret+0x16a
syscall2(d8c9dd40) at syscall2+0x2d6
Xint0x80_syscall() at Xint0x80_syscall+0x36
I can attempt to reproduce this if needed and can also provide the
source for the application. I still have the debug kernel but wasn't
able to glean any useful information from it myself.
Thanks,
Joe
Updated by dillon almost 16 years ago
:I managed to panic the kernel while doing some testing using signals.
:The application was doing the following:
:
:fork child
:
:child sets up signal handler for SIGHUP then loops forever calling
: pause().
:
:parent set up signal handler for SIGHUP then parent loops count number
: of times sending a SIGHUP to the child then calls pause().
:
:child's SIGHUP handler just sends a SIGHUP to the parent.
:parent's SIGHUP handler calculates the round-trip time for the signal.
:
:This appears to work fine for count < 1000 or so. I tried an
:iteration where count = 5000 and panic'ed the kernel. I was unable to
:get the panic message from the serial console but was able to get the
:following trace from DDB:
:
:db> trace
:Debugger(c03d444f) at Debugger+0x34
:panic(c03c8398,c040a210,c03c7238,d2684d58,2) at panic+0x9f
:userret(6,0,0,d2684d58,c041f11c) at userret+0x16a
:syscall2(d8c9dd40) at syscall2+0x2d6
:Xint0x80_syscall() at Xint0x80_syscall+0x36
:
:I can attempt to reproduce this if needed and can also provide the
:source for the application. I still have the debug kernel but wasn't
:able to glean any useful information from it myself.
:
:Thanks,
:Joe
I think I need the program to reproduce it. I wrote a program based
on your description, which I include below, but it doesn't seem to
reproduce the problem.
In my program instead of having the parent send a SIGHUP from its
main loop I just have the two signal handlers ping-pong the signal,
with the child serving the first ball.
-Matt
Matthew Dillon
<dillon@backplane.com>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static void sig_child(int sig);
static void sig_parent(int sig);
pid_t Pid_parent;
pid_t Pid_child;
struct timeval Tv1;
struct timeval Tv2;
struct timeval TvDelta;
int64_t Count;
int
main(int ac, char **av)
{
int i;
Pid_parent = getpid();
signal(SIGHUP, sig_parent);
if ((Pid_child = fork()) == 0) {
Pid_child = getpid();
signal(SIGHUP, sig_child);
kill(Pid_parent, SIGHUP); /* start it going */
for (;;)
pause();
}
/*
* NOTE: Count and TvDelta updates can race, so we may occasionally
* print a bad value.
*/
for (;;) {
pause();
if (Count % 10000 == 0) {
printf("%9lld %6.2fuS\n",
Count,
((double)TvDelta.tv_sec * 1000000.0 +
(double)TvDelta.tv_usec) / (double)Count
);
}
}
}
static
void
sig_child(int sig)
{
kill(Pid_parent, SIGHUP);
}
static
void
sig_parent(int sig)
{
int usec;
++Count;
Tv1 = Tv2;
gettimeofday(&Tv2, NULL);
if (Count > 1) {
usec = (Tv2.tv_sec - Tv1.tv_sec) * 1000000 +
(Tv2.tv_usec - Tv1.tv_usec);
usec += TvDelta.tv_usec;
if (usec > 1000000) {
TvDelta.tv_sec += usec / 1000000;
usec %= 1000000;
}
TvDelta.tv_usec = usec;
}
kill(Pid_child, SIGHUP);
}
Updated by corecode almost 16 years ago
Joseph, can you please post the original test code?
Updated by josepht almost 16 years ago
On Thu, Apr 23, 2009 at 07:54:59AM +0000, Simon 'corecode' Schubert (via DragonFly issue tracker) wrote:
Simon 'corecode' Schubert <corecode@fs.ei.tum.de> added the comment:
Joseph, can you please post the original test code?
I think this issue can be closed. Neither I nor Matt were able to
reproduce this.
Joe