Bug #1993

panic: assertion: so->so_port == &curthread->td_msgport in tcp_input

Added by pavalos almost 4 years ago. Updated over 3 years ago.

Status:ClosedStart date:
Priority:HighDue date:
Assignee:sepherosa% Done:

0%

Category:-
Target version:-

Description

I got this panic today on a box running master from Feb. 4th:

panic: assertion: so->so_port == &curthread->td_msgport in tcp_input

(kgdb) bt
#0 _get_mycpu (di=0xc0400c80) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc0400c80) at /usr/src/sys/platform/pc32/i386/dump_machdep.c:264
#2 0xc0197de2 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:893
#3 0xc01983a2 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
#4 0xc019866d in panic (fmt=0xc03525b8 "assertion: %s in %s") at /usr/src/sys/kern/kern_shutdown.c:799
#5 0xc022a433 in tcp_input (mp=0xd8a5ecb4, offp=0xd8a5ecb0, proto=6) at /usr/src/sys/netinet/tcp_input.c:1094
#6 0xc0223884 in transport_processing_oncpu (m=0x0, hlen=20, ip=0xe6f93010) at /usr/src/sys/netinet/ip_input.c:396
#7 0xc0225655 in ip_input (m=0xe09a5600) at /usr/src/sys/netinet/ip_input.c:961
#8 0xc022567d in ip_input_handler (msg=0xe09a561c) at /usr/src/sys/netinet/ip_input.c:417
#9 0xc0215750 in netmsg_service_loop (arg=0x0) at /usr/src/sys/net/netisr.c:307
#10 0xc01a1580 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:276

It happens when I restart nginx. I tried killing nginx, then waiting for
about 3 minutes. A few seconds after restarting nginx, I get the panic.
Files can be downloaded from:
http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz

--Peter

History

#1 Updated by sepherosa almost 4 years ago

On Sat, Feb 19, 2011 at 5:26 PM, Peter Avalos <> wrote:
> I got this panic today on a box running master from Feb. 4th:
>
> panic: assertion: so->so_port == &curthread->td_msgport in tcp_input
>
> (kgdb) bt
> #0  _get_mycpu (di=0xc0400c80) at ./machine/thread.h:83
> #1  md_dumpsys (di=0xc0400c80) at /usr/src/sys/platform/pc32/i386/dump_machdep.c:264
> #2  0xc0197de2 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:893
> #3  0xc01983a2 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
> #4  0xc019866d in panic (fmt=0xc03525b8 "assertion: %s in %s") at /usr/src/sys/kern/kern_shutdown.c:799
> #5  0xc022a433 in tcp_input (mp=0xd8a5ecb4, offp=0xd8a5ecb0, proto=6) at /usr/src/sys/netinet/tcp_input.c:1094
> #6  0xc0223884 in transport_processing_oncpu (m=0x0, hlen=20, ip=0xe6f93010) at /usr/src/sys/netinet/ip_input.c:396
> #7  0xc0225655 in ip_input (m=0xe09a5600) at /usr/src/sys/netinet/ip_input.c:961
> #8  0xc022567d in ip_input_handler (msg=0xe09a561c) at /usr/src/sys/netinet/ip_input.c:417
> #9  0xc0215750 in netmsg_service_loop (arg=0x0) at /usr/src/sys/net/netisr.c:307
> #10 0xc01a1580 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
> ) at /usr/src/sys/kern/lwkt_thread.c:276
>
> It happens when I restart nginx.  I tried killing nginx, then waiting for
> about 3 minutes.  A few seconds after restarting nginx, I get the panic.
> Files can be downloaded from:
> http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz

The crash seems to be truncated. I tried on leaf and my own box,
there are 2010 bytes missing.

Best Regards,
sephe

#2 Updated by sepherosa almost 4 years ago

On Sat, Feb 19, 2011 at 11:03 PM, Sepherosa Ziehau <> wrote:
> On Sat, Feb 19, 2011 at 5:26 PM, Peter Avalos <> wrote:
>> I got this panic today on a box running master from Feb. 4th:
>>
>> panic: assertion: so->so_port == &curthread->td_msgport in tcp_input
>>
>> (kgdb) bt
>> #0  _get_mycpu (di=0xc0400c80) at ./machine/thread.h:83
>> #1  md_dumpsys (di=0xc0400c80) at /usr/src/sys/platform/pc32/i386/dump_machdep.c:264
>> #2  0xc0197de2 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:893
>> #3  0xc01983a2 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
>> #4  0xc019866d in panic (fmt=0xc03525b8 "assertion: %s in %s") at /usr/src/sys/kern/kern_shutdown.c:799
>> #5  0xc022a433 in tcp_input (mp=0xd8a5ecb4, offp=0xd8a5ecb0, proto=6) at /usr/src/sys/netinet/tcp_input.c:1094
>> #6  0xc0223884 in transport_processing_oncpu (m=0x0, hlen=20, ip=0xe6f93010) at /usr/src/sys/netinet/ip_input.c:396
>> #7  0xc0225655 in ip_input (m=0xe09a5600) at /usr/src/sys/netinet/ip_input.c:961
>> #8  0xc022567d in ip_input_handler (msg=0xe09a561c) at /usr/src/sys/netinet/ip_input.c:417
>> #9  0xc0215750 in netmsg_service_loop (arg=0x0) at /usr/src/sys/net/netisr.c:307
>> #10 0xc01a1580 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
>> ) at /usr/src/sys/kern/lwkt_thread.c:276
>>
>> It happens when I restart nginx.  I tried killing nginx, then waiting for
>> about 3 minutes.  A few seconds after restarting nginx, I get the panic.
>> Files can be downloaded from:
>> http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz
>
> The crash seems to be truncated.  I tried on leaf and my own box,
> there are 2010 bytes missing.

Well, the received part still could work :) I am looking at it.

>
> Best Regards,
> sephe
>
> --
> Tomorrow Will Never Die
>

#3 Updated by pavalos almost 4 years ago

On Sat, Feb 19, 2011 at 11:03:50PM +0800, Sepherosa Ziehau wrote:
> > http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz
>
> The crash seems to be truncated. I tried on leaf and my own box,
> there are 2010 bytes missing.
>

That's weird. We'll I've re-tar'd (that's a sweet word) it, as well as
the other crashes I got:

http://www.theshell.com/~pavalos/crash/

ylem-crash[6-12].tar.xz

--Peter

#4 Updated by sepherosa almost 4 years ago

On Sun, Feb 20, 2011 at 5:44 AM, Peter Avalos <> wrote:
> On Sat, Feb 19, 2011 at 11:03:50PM +0800, Sepherosa Ziehau wrote:
>> > http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz
>>
>> The crash seems to be truncated.  I tried on leaf and my own box,
>> there are 2010 bytes missing.
>>
>
> That's weird.  We'll I've re-tar'd (that's a sweet word) it, as well as
> the other crashes I got:
>
> http://www.theshell.com/~pavalos/crash/
>
> ylem-crash[6-12].tar.xz

Well, the crash6 looks strange in following way:
1) tcp_usr_listen seems to be called on a same inp twice
2) an "accepted" tcp socket is duplicated onto the wildcardhash and it
seems to be linked into connhash too. The socket is in FIN_WAIT_1

crash6 is caused by 2), though 1) was problematic too

I will add some assertion, so the panic could happen earlier

Best Regards,
sephe

#5 Updated by sepherosa almost 4 years ago

On Mon, Feb 21, 2011 at 10:11 AM, Sepherosa Ziehau <> wrote:
> On Sun, Feb 20, 2011 at 5:44 AM, Peter Avalos <> wrote:
>> On Sat, Feb 19, 2011 at 11:03:50PM +0800, Sepherosa Ziehau wrote:
>>> > http://www.theshell.com/~pavalos/crash/ylem-crash6.tar.xz
>>>
>>> The crash seems to be truncated.  I tried on leaf and my own box,
>>> there are 2010 bytes missing.
>>>
>>
>> That's weird.  We'll I've re-tar'd (that's a sweet word) it, as well as
>> the other crashes I got:
>>
>> http://www.theshell.com/~pavalos/crash/
>>
>> ylem-crash[6-12].tar.xz
>
> Well, the crash6 looks strange in following way:
> 1) tcp_usr_listen seems to be called on a same inp twice
> 2) an "accepted" tcp socket is duplicated onto the wildcardhash and it
> seems to be linked into connhash too.  The socket is in FIN_WAIT_1
>
> crash6 is caused by 2), though 1) was problematic too
>
> I will add some assertion, so the panic could happen earlier

It turns out the 2) is caused by 1):
1] The application will call listen on the socket 2 times, so the old
code will create 2 inpcontainer which points to the same inp
2] During the application stop, the application closes the socket, but
only one inpcontainer is unlinked, so the second inpcontainer points
to a dangling inp
3] After the application starts again, one of the newly accepted
socket reuse the old listen socket's inp memory location
4] The dangling inp pointed by the inpcontainer now is the new valid
accepted socket, so an accepted socket appears "mysteriously" on the
wildcardhash, while in the crash, the accepted socket is on cpu0, the
lookup happens on cpu1
5] The inp lookup on the tcp_input will find wildcard inp, but it
prefers laddr and lport matching over only lport matching, so the
accepted socket is returned

I believe it is fixed by:
f7b29de575ce11c14e6363518ea4d28267cf3ccc

Best Regards,
sephe

Also available in: Atom PDF