Bug #1322

panic with high signal load

Added by josepht over 5 years ago. Updated about 5 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

I managed to panic the kernel while doing some testing using signals.
The application was doing the following:

fork child

child sets up signal handler for SIGHUP then loops forever calling
pause().

parent set up signal handler for SIGHUP then parent loops count number
of times sending a SIGHUP to the child then calls pause().

child's SIGHUP handler just sends a SIGHUP to the parent.
parent's SIGHUP handler calculates the round-trip time for the signal.

This appears to work fine for count < 1000 or so. I tried an
iteration where count = 5000 and panic'ed the kernel. I was unable to
get the panic message from the serial console but was able to get the
following trace from DDB:

db> trace
Debugger(c03d444f) at Debugger+0x34
panic(c03c8398,c040a210,c03c7238,d2684d58,2) at panic+0x9f
userret(6,0,0,d2684d58,c041f11c) at userret+0x16a
syscall2(d8c9dd40) at syscall2+0x2d6
Xint0x80_syscall() at Xint0x80_syscall+0x36

I can attempt to reproduce this if needed and can also provide the
source for the application. I still have the debug kernel but wasn't
able to glean any useful information from it myself.

Thanks,
Joe

History

#1 Updated by dillon over 5 years ago

:I managed to panic the kernel while doing some testing using signals.
:The application was doing the following:
:
:fork child
:
:child sets up signal handler for SIGHUP then loops forever calling
: pause().
:
:parent set up signal handler for SIGHUP then parent loops count number
: of times sending a SIGHUP to the child then calls pause().
:
:child's SIGHUP handler just sends a SIGHUP to the parent.
:parent's SIGHUP handler calculates the round-trip time for the signal.
:
:This appears to work fine for count < 1000 or so. I tried an
:iteration where count = 5000 and panic'ed the kernel. I was unable to
:get the panic message from the serial console but was able to get the
:following trace from DDB:
:
:db> trace
:Debugger(c03d444f) at Debugger+0x34
:panic(c03c8398,c040a210,c03c7238,d2684d58,2) at panic+0x9f
:userret(6,0,0,d2684d58,c041f11c) at userret+0x16a
:syscall2(d8c9dd40) at syscall2+0x2d6
:Xint0x80_syscall() at Xint0x80_syscall+0x36
:
:I can attempt to reproduce this if needed and can also provide the
:source for the application. I still have the debug kernel but wasn't
:able to glean any useful information from it myself.
:
:Thanks,
:Joe

I think I need the program to reproduce it. I wrote a program based
on your description, which I include below, but it doesn't seem to
reproduce the problem.

In my program instead of having the parent send a SIGHUP from its
main loop I just have the two signal handlers ping-pong the signal,
with the child serving the first ball.

-Matt
Matthew Dillon
<>

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void sig_child(int sig);
static void sig_parent(int sig);

pid_t Pid_parent;
pid_t Pid_child;

struct timeval Tv1;
struct timeval Tv2;
struct timeval TvDelta;

int64_t Count;

int
main(int ac, char **av)
{
int i;

Pid_parent = getpid();
signal(SIGHUP, sig_parent);

if ((Pid_child = fork()) == 0) {
Pid_child = getpid();

signal(SIGHUP, sig_child);
kill(Pid_parent, SIGHUP); /* start it going */
for (;;)
pause();
}

/*
* NOTE: Count and TvDelta updates can race, so we may occasionally
* print a bad value.
*/
for (;;) {
pause();
if (Count % 10000 == 0) {
printf("%9lld %6.2fuS\n",
Count,
((double)TvDelta.tv_sec * 1000000.0 +
(double)TvDelta.tv_usec) / (double)Count
);
}
}
}

static
void
sig_child(int sig)
{
kill(Pid_parent, SIGHUP);
}

static
void
sig_parent(int sig)
{
int usec;

++Count;
Tv1 = Tv2;
gettimeofday(&Tv2, NULL);
if (Count > 1) {
usec = (Tv2.tv_sec - Tv1.tv_sec) * 1000000 +
(Tv2.tv_usec - Tv1.tv_usec);

usec += TvDelta.tv_usec;
if (usec > 1000000) {
TvDelta.tv_sec += usec / 1000000;
usec %= 1000000;
}
TvDelta.tv_usec = usec;
}
kill(Pid_child, SIGHUP);
}

#2 Updated by corecode over 5 years ago

Joseph, can you please post the original test code?

#3 Updated by josepht over 5 years ago

On Thu, Apr 23, 2009 at 07:54:59AM +0000, Simon 'corecode' Schubert (via DragonFly issue tracker) wrote:
>
> Simon 'corecode' Schubert <> added the comment:
>
> Joseph, can you please post the original test code?

I think this issue can be closed. Neither I nor Matt were able to
reproduce this.

Joe

#4 Updated by aoiko about 5 years ago

unreproducible

Also available in: Atom PDF