Bug #1769: panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #1769

open

panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active

Added by pavalos over 15 years ago. Updated over 3 years ago.

Status:

New

Priority:

Normal

Assignee:

sjg

Category:

Target version:

Unverifiable

Start date:

Due date:

% Done:

Estimated time:

Description

panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
mp_lock = 00000001; cpuid = 1
Trace beginning at frame 0xd82db9b8
panic(ffffffff) at panic+0x14f
panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
tcp_input(e6b7e000,14,6) at tcp_input+0x3d63
transport_processing_oncpu(14,0,0,0,0) at transport_processing_oncpu+0x95
ip_input(e6b7e000) at ip_input+0xf11
ip_input_handler(e6b7e018) at ip_input_handler+0xe
netisr_run(2,e6b7e000) at netisr_run+0xdf
ether_demux_oncpu(d7e2c000,e6b7e000) at ether_demux_oncpu+0x37c
ether_input_oncpu(d7e2c000,e6b7e000) at ether_input_oncpu+0x138
ether_input_handler(e6b7e018) at ether_input_handler+0x102
netmsg_service(e6b7e018,1,0,1,ff8083d4) at netmsg_service+0x9d
tcpmsg_service_loop(0,0,0,0,0) at tcpmsg_service_loop+0x43
lwkt_exit() at lwkt_exit
boot() called on cpu#1
Uptime: 14d22h16m8s
Physical memory: 2043 MB
Dumping 487 MB: 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8

_get_mycpu (di=0xc0466b60) at ./machine/thread.h:83
83          __asm ("movl %%fs:globaldata,%0" : "=r" (gd) : "m"(__mycpu__dummy));


ylem:/var/crash# uname -a
DragonFly ylem.theshell.com 2.7-DEVELOPMENT DragonFly v2.7.3.8.g3219b-DEVELOPMENT #28: Fri May  7 09:16:10 HST 2010     root@ylem.theshell.com:/usr/obj/usr/src/sys/YLEM  i386

This happened twice while switching back and forth between lighttpd
and nginx. Kernel and cores being uploaded to
leaf:~pavalos/crash/*30 and *31

--Peter

Actions

Copy link

Updated by dillon about 15 years ago

:panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
:mp_lock = 00000001; cpuid = 1
:Trace beginning at frame 0xd82db9b8
:panic(ffffffff) at panic+0x14f
:panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
:tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
:tcp_input(e6b7e000,14,6) at tcp_input+0x3d63

This one is really difficult to track down even with the
    kernel core.  I think the only real way to do it is to add
    assertions near the top of tcp_input() and tcp_output() after
    the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
    mycpu->gd_cpuid, to try to catch the problem earlier in the
    procedure chain.

Even worse, we still have ipv6 hacks for the tcp stack that
    puts all ipv6 transport processing on cpu 0, and ipv6->ipv4
    conversion hacks for connections that screw up the model.
    It's a real mess, frankly.

-Matt

Actions

Copy link

Updated by aoiko about 15 years ago

On 26/05/2010 02:42 πμ, Matthew Dillon wrote:

:panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
:mp_lock = 00000001; cpuid = 1
:Trace beginning at frame 0xd82db9b8
:panic(ffffffff) at panic+0x14f
:panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
:tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
:tcp_input(e6b7e000,14,6) at tcp_input+0x3d63

This one is really difficult to track down even with the
kernel core. I think the only real way to do it is to add
assertions near the top of tcp_input() and tcp_output() after
the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
mycpu->gd_cpuid, to try to catch the problem earlier in the
procedure chain.

We could also use ktr to log sent/delivered netmsgs. That and a large
value for KTR_ENTRIES should give us enough info to debug the problem.
I've often thought of doing that while debugging netmp and libevtr will
make analyzing the data much easier.

Aggelos

Actions

Copy link

Updated by dillon about 15 years ago

I've decided to undertake some major reworking of our network
stack. Well, it is really more a continuation of the work that
was done last year.

I already have the protosw cleaned up and ip_off and ip_len fixed
    up so they are left in network byte order (instead of switching back
    and forth).  I have some more work to do with ip_len to get rid of
    the header length trimming code and a ton of other stuff.  This
    will remove all the packet back-and-forth munging that is being done
    now.

This all leads up to being able to remove all the special cpu selection
    cases in the individual protocol stacks and in particular cleaning up
    the tcp syncache, and doing unconditional cpu selection closer to the
    netif.

When I get IPV4 stable I'll put the git branch up on leaf for people to
    test.  IPV6, IPV4 & IPV6 fragmentation, ICMP, and IPSEC will need a lot
    of testing.  Also PF (particularly NAT) and IPFW.  There will be
    lots of things needing testing.