Bug #1769

panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active

Added by pavalos about 4 years ago. Updated almost 4 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:sjg% Done:

0%

Category:-
Target version:-

Description

panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
mp_lock = 00000001; cpuid = 1
Trace beginning at frame 0xd82db9b8
panic(ffffffff) at panic+0x14f
panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
tcp_input(e6b7e000,14,6) at tcp_input+0x3d63
transport_processing_oncpu(14,0,0,0,0) at transport_processing_oncpu+0x95
ip_input(e6b7e000) at ip_input+0xf11
ip_input_handler(e6b7e018) at ip_input_handler+0xe
netisr_run(2,e6b7e000) at netisr_run+0xdf
ether_demux_oncpu(d7e2c000,e6b7e000) at ether_demux_oncpu+0x37c
ether_input_oncpu(d7e2c000,e6b7e000) at ether_input_oncpu+0x138
ether_input_handler(e6b7e018) at ether_input_handler+0x102
netmsg_service(e6b7e018,1,0,1,ff8083d4) at netmsg_service+0x9d
tcpmsg_service_loop(0,0,0,0,0) at tcpmsg_service_loop+0x43
lwkt_exit() at lwkt_exit
boot() called on cpu#1
Uptime: 14d22h16m8s
Physical memory: 2043 MB
Dumping 487 MB: 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8

_get_mycpu (di=0xc0466b60) at ./machine/thread.h:83
83 __asm ("movl %%fs:globaldata,%0" : "=r" (gd) : "m"(__mycpu__dummy));

ylem:/var/crash# uname -a
DragonFly ylem.theshell.com 2.7-DEVELOPMENT DragonFly v2.7.3.8.g3219b-DEVELOPMENT #28: Fri May 7 09:16:10 HST 2010 :/usr/obj/usr/src/sys/YLEM i386

This happened twice while switching back and forth between lighttpd
and nginx. Kernel and cores being uploaded to
leaf:~pavalos/crash/*30 and *31

--Peter

History

#1 Updated by dillon about 4 years ago

:panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
:mp_lock = 00000001; cpuid = 1
:Trace beginning at frame 0xd82db9b8
:panic(ffffffff) at panic+0x14f
:panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
:tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
:tcp_input(e6b7e000,14,6) at tcp_input+0x3d63

This one is really difficult to track down even with the
kernel core. I think the only real way to do it is to add
assertions near the top of tcp_input() and tcp_output() after
the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
mycpu->gd_cpuid, to try to catch the problem earlier in the
procedure chain.

Even worse, we still have ipv6 hacks for the tcp stack that
puts all ipv6 transport processing on cpu 0, and ipv6->ipv4
conversion hacks for connections that screw up the model.
It's a real mess, frankly.

-Matt

#2 Updated by aoiko about 4 years ago

On 26/05/2010 02:42 πμ, Matthew Dillon wrote:
> :panic: assertion: _tp->tt_msg->tt_cpuid == mycpuid in tcp_callout_active
> :mp_lock = 00000001; cpuid = 1
> :Trace beginning at frame 0xd82db9b8
> :panic(ffffffff) at panic+0x14f
> :panic(c037a20a,c03a4300,c036edf8,e100,0) at panic+0x14f
> :tcp_output(e3462208,e6b7e000) at tcp_output+0xe9a
> :tcp_input(e6b7e000,14,6) at tcp_input+0x3d63
>
> This one is really difficult to track down even with the
> kernel core. I think the only real way to do it is to add
> assertions near the top of tcp_input() and tcp_output() after
> the tp is looked up to assert that tt->tt_msg->tt_cpuid ==
> mycpu->gd_cpuid, to try to catch the problem earlier in the
> procedure chain.

We could also use ktr to log sent/delivered netmsgs. That and a large
value for KTR_ENTRIES should give us enough info to debug the problem.
I've often thought of doing that while debugging netmp and libevtr will
make analyzing the data much easier.

Aggelos

#3 Updated by dillon about 4 years ago

I've decided to undertake some major reworking of our network
stack. Well, it is really more a continuation of the work that
was done last year.

I already have the protosw cleaned up and ip_off and ip_len fixed
up so they are left in network byte order (instead of switching back
and forth). I have some more work to do with ip_len to get rid of
the header length trimming code and a ton of other stuff. This
will remove all the packet back-and-forth munging that is being done
now.

This all leads up to being able to remove all the special cpu selection
cases in the individual protocol stacks and in particular cleaning up
the tcp syncache, and doing unconditional cpu selection closer to the
netif.

When I get IPV4 stable I'll put the git branch up on leaf for people to
test. IPV6, IPV4 & IPV6 fragmentation, ICMP, and IPSEC will need a lot
of testing. Also PF (particularly NAT) and IPFW. There will be
lots of things needing testing.

-Matt

#4 Updated by sjg almost 4 years ago

grab

Also available in: Atom PDF