Bug #1769
openpanic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
0%
Description
panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active mp_lock = 00000001; cpuid = 1 Trace beginning at frame 0xd82db9b8 panic(ffffffff) at panic+0x14f panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a tcp_input(e6b7e000,14,6) at tcp_input+0x3d63 transport_processing_oncpu(14,0,0,0,0) at transport_processing_oncpu+0x95 ip_input(e6b7e000) at ip_input+0xf11 ip_input_handler(e6b7e018) at ip_input_handler+0xe netisr_run(2,e6b7e000) at netisr_run+0xdf ether_demux_oncpu(d7e2c000,e6b7e000) at ether_demux_oncpu+0x37c ether_input_oncpu(d7e2c000,e6b7e000) at ether_input_oncpu+0x138 ether_input_handler(e6b7e018) at ether_input_handler+0x102 netmsg_service(e6b7e018,1,0,1,ff8083d4) at netmsg_service+0x9d tcpmsg_service_loop(0,0,0,0,0) at tcpmsg_service_loop+0x43 lwkt_exit() at lwkt_exit boot() called on cpu#1 Uptime: 14d22h16m8s Physical memory: 2043 MB Dumping 487 MB: 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8 _get_mycpu (di=0xc0466b60) at ./machine/thread.h:83 83 __asm ("movl %%fs:globaldata,%0" : "=r" (gd) : "m"(__mycpu__dummy));
ylem:/var/crash# uname -a DragonFly ylem.theshell.com 2.7-DEVELOPMENT DragonFly v2.7.3.8.g3219b-DEVELOPMENT #28: Fri May 7 09:16:10 HST 2010 root@ylem.theshell.com:/usr/obj/usr/src/sys/YLEM i386
This happened twice while switching back and forth between lighttpd
and nginx. Kernel and cores being uploaded to
leaf:~pavalos/crash/*30 and *31
--Peter
Updated by dillon over 14 years ago
:panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
:mp_lock = 00000001; cpuid = 1
:Trace beginning at frame 0xd82db9b8
:panic(ffffffff) at panic+0x14f
:panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
:tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
:tcp_input(e6b7e000,14,6) at tcp_input+0x3d63
This one is really difficult to track down even with the
kernel core. I think the only real way to do it is to add
assertions near the top of tcp_input() and tcp_output() after
the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
mycpu->gd_cpuid, to try to catch the problem earlier in the
procedure chain.
Even worse, we still have ipv6 hacks for the tcp stack that
puts all ipv6 transport processing on cpu 0, and ipv6->ipv4
conversion hacks for connections that screw up the model.
It's a real mess, frankly.
-Matt
Updated by aoiko over 14 years ago
On 26/05/2010 02:42 πμ, Matthew Dillon wrote:
:panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
:mp_lock = 00000001; cpuid = 1
:Trace beginning at frame 0xd82db9b8
:panic(ffffffff) at panic+0x14f
:panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
:tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
:tcp_input(e6b7e000,14,6) at tcp_input+0x3d63This one is really difficult to track down even with the
kernel core. I think the only real way to do it is to add
assertions near the top of tcp_input() and tcp_output() after
the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
mycpu->gd_cpuid, to try to catch the problem earlier in the
procedure chain.
We could also use ktr to log sent/delivered netmsgs. That and a large
value for KTR_ENTRIES should give us enough info to debug the problem.
I've often thought of doing that while debugging netmp and libevtr will
make analyzing the data much easier.
Aggelos
Updated by dillon over 14 years ago
I've decided to undertake some major reworking of our network
stack. Well, it is really more a continuation of the work that
was done last year.
I already have the protosw cleaned up and ip_off and ip_len fixed
up so they are left in network byte order (instead of switching back
and forth). I have some more work to do with ip_len to get rid of
the header length trimming code and a ton of other stuff. This
will remove all the packet back-and-forth munging that is being done
now.
This all leads up to being able to remove all the special cpu selection
cases in the individual protocol stacks and in particular cleaning up
the tcp syncache, and doing unconditional cpu selection closer to the
netif.
When I get IPV4 stable I'll put the git branch up on leaf for people to
test. IPV6, IPV4 & IPV6 fragmentation, ICMP, and IPSEC will need a lot
of testing. Also PF (particularly NAT) and IPFW. There will be
lots of things needing testing.
-Matt
Updated by tuxillo over 2 years ago
- Description updated (diff)
- Target version changed from 6.4 to Unverifiable
Core dumps for i386, which is no longer supported. Also, note enough information on how to reproduce it, moving to unverifiable.