Bug #937: tcp_sack related panic - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #937

closed

tcp_sack related panic

Added by pavalos about 18 years ago. Updated about 17 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

Fatal trap 12: page fault while in kernel mode
mp_lock = 00000000; cpuid = 0; lapic.id = 00000000
fault virtual address = 0x4
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc0233d36
stack pointer = 0x10:0xdaa45a70
frame pointer = 0x10:0xdaa45a80
code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
current thread = pri 12
<- SMP: XXX
trap number = 12
panic: page fault
mp_lock = 00000000; cpuid = 0
boot() called on cpu#0
Uptime: 3d11h5m38s

dumping to dev #da/0x20001, blockno 378927

(kgdb) bt
#0 dumpsys () at ./machine/thread.h:83
#1 0xc01a2ea9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:375
#2 0xc01a316c in panic (fmt=0xc033781c "%s") at /usr/src/sys/kern/kern_shutdown.c:800
#3 0xc0310a61 in trap_fatal (frame=0xdaa45a28, eva=<value optimized out>) at /usr/src/sys/platform/pc32/i386/trap.c:1102
#4 0xc0310b9b in trap_pfault (frame=0xdaa45a28, usermode=0, eva=4) at /usr/src/sys/platform/pc32/i386/trap.c:1003
#5 0xc0311198 in trap (frame=0xdaa45a28) at /usr/src/sys/platform/pc32/i386/trap.c:686
#6 0xc02fe396 in calltrap () at /usr/src/sys/platform/pc32/i386/exception.s:783
#7 0xc0233d36 in sack_block_lookup (scb=0xdace6b0c, seq=1554912228, sb=0xdaa45a90) at /usr/src/sys/netinet/tcp_sack.c:128
#8 0xc0233eda in tcp_sack_nextseg (tp=0xdace6a20, nextrexmt=0xdaa45ad0, plen=0xdaa45ad4, lostdup=0xdaa45acc) at /usr/src/sys/netinet/tcp_sack.c:496
#9 0xc022f603 in tcp_sack_rexmt (tp=0xdace6a20, th=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:3154
#10 0xc0231aca in tcp_input (m=0xee2c5a00) at /usr/src/sys/netinet/tcp_input.c:1981
#11 0xc0229ae2 in transport_processing_oncpu (m=0xee2c5a00, hlen=20, ip=<value optimized out>, nexthop=0x0) at /usr/src/sys/netinet/ip_input.c:391
#12 0xc022bae0 in ip_input (m=0xee2c5a00) at /usr/src/sys/netinet/ip_input.c:1092
#13 0xc022bbb4 in ip_input_handler (msg0=0xee2c5a18) at /usr/src/sys/netinet/ip_input.c:421
#14 0xc0235653 in tcpmsg_service_loop (dummy=0x0) at /usr/src/sys/netinet/tcp_subr.c:385
#15 0xc01a9fa5 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:214
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

uname -a
DragonFly ylem.theshell.com 1.11.0-DEVELOPMENT DragonFly 1.11.0-DEVELOPMENT #11: Mon Jan 28 18:13:59 EST 2008 root@ylem.theshell.com:/usr/obj/usr/src/sys/YLEM i386

Actions

Copy link

Updated by dillon about 18 years ago

:#6 0xc02fe396 in calltrap () at /usr/src/sys/platform/pc32/i386/exception.=
:s:783
:#7 0xc0233d36 in sack_block_lookup (scb=3D0xdace6b0c, seq=3D1554912228, sb=
:=3D0xdaa45a90) at /usr/src/sys/netinet/tcp_sack.c:128
:#8 0xc0233eda in tcp_sack_nextseg (tp=3D0xdace6a20, nextrexmt=3D0xdaa45ad0=
:, plen=3D0xdaa45ad4, lostdup=3D0xdaa45acc) at /usr/src/sys/netinet/tcp_sack=
:=2Ec:496
:#9 0xc022f603 in tcp_sack_rexmt (tp=3D0xdace6a20, th=3D<value optimized ou=

Hmm.  I see two places where a node is removed from the sackblocks list
    but lastfound is not cleared on match.  I don't know if this is the
    issue but it's the most obvious from looking at the failure.

I'll commit this tomorrow if no new developments come up.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Index: tcp_sack.c ===================================================================
RCS file: /cvs/src/sys/netinet/tcp_sack.c,v
retrieving revision 1.6
diff u -p -r1.6 tcp_sack.c
-- tcp_sack.c 22 Apr 2007 01:13:14 0000 1.6
++ tcp_sack.c 3 Feb 2008 01:32:16 -0000
@ -176,7 +176,7 @
sb = TAILQ_FIRST(&scb>sackblocks);
while (sb && SEQ_LEQ(sb->sblk_end, th_ack)) {
nb = TAILQ_NEXT(sb, sblk_list);
- if (sb scb->lastfound)
if (scb->lastfound sb)
scb->lastfound = NULL;
TAILQ_REMOVE(&scb->sackblocks, sb, sblk_list);
free_sackblock(sb);
@ -334,6 +334,8 @ SEQ_GEQ(workingblock->sblk_end, sb-
struct sackblock *nextblock;

nextblock = TAILQ_NEXT(sb, sblk_list);
+        if (scb->lastfound == sb)
+            scb->lastfound = NULL;
         /* Remove completely overlapped block /
         TAILQ_REMOVE(&scb->sackblocks, sb, sblk_list);
         free_sackblock(sb);
@ -346,6 +348,8 @     if (sb != NULL &&
         SEQ_GEQ(workingblock->sblk_end, sb->sblk_start)) {
         / Extend new block to cover partially overlapped old block. */
         workingblock->sblk_end = sb->sblk_end;
+        if (scb->lastfound == sb)
+            scb->lastfound = NULL;
         TAILQ_REMOVE(&scb->sackblocks, sb, sblk_list);
         free_sackblock(sb);
         --scb->nblocks;

Actions

Copy link

Updated by pavalos about 18 years ago

Also just got this with the same sources:

panic: zone: freeing free entry
mp_lock = 00000000; cpuid = 0
boot() called on cpu#0
Uptime: 1d11h35m59s

dumping to dev #da/0x20001, blockno 378927

(kgdb) bt
#0 dumpsys () at ./machine/thread.h:83
#1 0xc01a2ea9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:375
#2 0xc01a316c in panic (fmt=0xc034328a "zone: freeing free entry") at /usr/src/sys/kern/kern_shutdown.c:800
#3 0xc02a6aa8 in zerror (error=2) at /usr/src/sys/vm/vm_zone.c:567
#4 0xc02a6ff5 in zfree (z=0xd7049438, item=0xdb991760) at /usr/src/sys/vm/vm_zone.c:98
#5 0xc02341ac in tcp_sack_update_scoreboard (tp=0xdad397c0, to=0xdaa45be8) at /usr/src/sys/netinet/tcp_sack.c:165
#6 0xc02318d9 in tcp_input (m=0xeb7df200) at /usr/src/sys/netinet/tcp_input.c:1900
#7 0xc0229ae2 in transport_processing_oncpu (m=0xeb7df200, hlen=20, ip=<value optimized out>, nexthop=0x0) at /usr/src/sys/netinet/ip_input.c:391
#8 0xc022bae0 in ip_input (m=0xeb7df200) at /usr/src/sys/netinet/ip_input.c:1092
#9 0xc022bbb4 in ip_input_handler (msg0=0xeb7df218) at /usr/src/sys/netinet/ip_input.c:421
#10 0xc0235653 in tcpmsg_service_loop (dummy=0x0) at /usr/src/sys/netinet/tcp_subr.c:385
#11 0xc01a9fa5 in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:214
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Do you think it's the same problem?

Actions

Copy link

Updated by pavalos about 18 years ago

FYI, the vmcores are on leaf:~pavalos/crash. The first one is *12 and
the 2nd is *13.

--Peter

Actions

Copy link

Updated by dillon about 18 years ago

:Also just got this with the same sources:
:
:panic: zone: freeing free entry
:mp_lock =3D 00000000; cpuid =3D 0
:boot() called on cpu#0
:Uptime: 1d11h35m59s
:...
:#3 0xc02a6aa8 in zerror (error=3D2) at /usr/src/sys/vm/vm_zone.c:567
:#4 0xc02a6ff5 in zfree (z=3D0xd7049438, item=3D0xdb991760) at /usr/src/sys=
:/vm/vm_zone.c:98
:#5 0xc02341ac in tcp_sack_update_scoreboard (tp=3D0xdad397c0, to=3D0xdaa45=
:be8) at /usr/src/sys/netinet/tcp_sack.c:165
:#6 0xc02318d9 in tcp_input (m=3D0xeb7df200) at /usr/src/sys/netinet/tcp_in=
:put.c:1900
:#7 0xc0229ae2 in transport_processing_oncpu (m=3D0xeb7df200, hlen=3D20, ip=
:
:Do you think it's the same problem?

Same sources prior to the patch?  It's quite possible.

I tracked this second crash to line 321 of tcp_sack.c (the kgdb backtrace
    is all wrong due to all the inlining).  It's freeing 'newblock' here,
    which should always succeed at this paricular point in the code.

I think this case can only occur if the list had previously been
    corrupted due to the hint not getting NULL'd out in those two places.

-Matt
                    Matthew Dillon 
                    &lt;dillon@backplane.com&gt;

Actions

Copy link

Updated by corecode about 17 years ago

did this get committed?

Actions

Copy link

Updated by pavalos about 17 years ago

Committed in 9e3d6c9645ed28ef5b07a9b13e380e13a86deeb8. I haven't seen this
panic in about a year, so let's call it good.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #937

tcp_sack related panic

Updated by dillon about 18 years ago

Updated by pavalos about 18 years ago

Updated by pavalos about 18 years ago

Updated by dillon about 18 years ago

Updated by corecode about 17 years ago

Updated by pavalos about 17 years ago