Bug #135
closedpanic
Added by Johannes.Hofmann over 18 years ago. Updated about 18 years ago.
0%
Description
After running for about 1h, my system repeatably crashes with
panic: m_copym, negative len -13
I've got a core, but backtrace seems not to work.
I'm running cvs as of 2006-26-03 09:56 and have no special filesystems
mounted.
Johannes
Updated by dillon over 18 years ago
:After running for about 1h, my system repeatably crashes with
:
:panic: m_copym, negative len -13
:
:I've got a core, but backtrace seems not to work.
:I'm running cvs as of 2006-26-03 09:56 and have no special filesystems
:mounted.
:
: Johannes
We are using kgdb now instead of gdb on the CVS head branch for
debugging kernel cores. Have you tried kgdb ?
The CVS head is not considered stable right now, but even so this is
the first time I have seen an m_copym panic reported in a long time.
If there is some way for you to make that kernel binary and vmcore
available I could take a look at it. If you can, please reply to
me privately (off-list) with info on how I can get download it.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by dillon over 18 years ago
:Now I know a bit more, about what's going on. I'm using IPSEC from
:my laptop to a router to protect my WLAN.
:I can easily reproduce the problem by accessing
:http://www.outdoor.ch/article112.html
:
:Then at in tcp_output() (sys/netinet/tcp_output.c) at
:
:#ifdef IPSEC
: ipoptlen += ipsec_hdrsiz_tcp(tp);
:#endif
:
: /*
: * Adjust data length if insertion of options will
: * bump the packet length beyond the t_maxopd length.
: * Clear the FIN bit because we cut off the tail of
: * the segment.
: /
: if (len + optlen + ipoptlen > tp->t_maxopd) {
: /
: * If there is still more to send, don't close the connection.
: */
: flags &= ~TH_FIN;
: len = tp->t_maxopd - optlen - ipoptlen;
: sendalot = TRUE;
: }
:
:
:len ends up beeing 13 because tp>t_maxopd = 64, optlen = 12,
:and ipoptlen = 65 (because ipsec_hdrsiz_tcp(tp) = 65.
:
:But I don't know, how to handle this case correctly.
:
: Johannes
I appreciate the sleuthing work! I've forwarded your email to our
networking expert, Jeffrey Hsu. If it isn't tracked down in the next
few days by him or someone else I'll have a go at it myself.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by dillon over 18 years ago
: /*
: * Adjust data length if insertion of options will
: * bump the packet length beyond the t_maxopd length.
: * Clear the FIN bit because we cut off the tail of
: * the segment.
: /
: if (len + optlen + ipoptlen > tp->t_maxopd) {
: /
: * If there is still more to send, don't close the connection.
: */
: flags &= ~TH_FIN;
: len = tp->t_maxopd - optlen - ipoptlen;
: sendalot = TRUE;
: }
:
:len ends up beeing 13 because tp>t_maxopd = 64, optlen = 12,
:and ipoptlen = 65 (because ipsec_hdrsiz_tcp(tp) = 65.
:
:But I don't know, how to handle this case correctly.
:
: Johannes
Hmm. I looked at the crashdump you uploaded and at the definition
of tp->t_maxopd. t_maxopd is supposed to be the maximum packet
payload data + options. A value of 64 doesn't seem right, but
I manually tracked down the stack frame and dumped the tcpcb and
indeed, it is 64. t_maxseg is 52.
These numbers are way too low. They are derived from the
interface MTU, the route table, or from TCP options during handshaking.
I'm not sure where the numbers came from but that is probably our culprit.
The t_state is 5, which is TCPS_ESTABLISHED. So the question is now
how did the TCP connection wind up with such a small packet payload
limit?
-Matt
Matthew Dillon
<dillon@backplane.com>
(kgdb) print (struct tcpcb *)0xdd71a040
$37 = (struct tcpcb *) 0xdd71a040
(kgdb) print *$37
$38 = {
t_segq = {
lh_first = 0x0
},
t_dupacks = 0,
tt_cpu = 0,
tt_rexmt = 0xdd71a18c,
tt_persist = 0xdd71a1a8,
tt_keep = 0xdd71a1c4,
tt_2msl = 0xdd71a1e0,
tt_delack = 0xdd71a1fc,
t_inpcb = 0xdd719f80,
t_state = 5,
t_flags = 33761,
snd_up = 463293982,
snd_una = 463293983,
snd_recover = 463293982,
snd_max = 463293983,
snd_nxt = 463293983,
snd_wl1 = 1376273623,
snd_wl2 = 463293983,
iss = 463293982,
irs = 1376273622,
rcv_nxt = 1376273623,
rcv_adv = 1376330979,
rcv_wnd = 57356,
rcv_up = 1376273623,
snd_wnd = 49248,
snd_cwnd = 208,
snd_wacked = 0,
snd_ssthresh = 1073725440,
t_rxtcur = 120,
t_maxseg = 52,
t_srtt = 128,
t_rttvar = 32,
t_maxopd = 64,
t_rcvtime = 316664,
t_starttime = 316664,
t_rtttime = 0,
t_rtseq = 463293982,
t_rxtshift = 0,
t_rttmin = 100,
t_rttbest = 160,
t_rttupdated = 1,
max_sndwnd = 49248,
t_softerror = 0,
t_oobflags = 0 '\0',
t_iobc = 0 '\0',
snd_scale = 0 '\0',
rcv_scale = 0 '\0',
request_r_scale = 0 '\0',
requested_s_scale = 0 '\0',
ts_recent = 53239860,
ts_recent_age = 316664,
last_ack_sent = 0,
cc_send = 240,
cc_recv = 0,
snd_cwnd_prev = 0,
snd_wacked_prev = 0,
snd_ssthresh_prev = 0,
snd_recover_prev = 0,
t_badrxtwin = 0,
t_rexmtTS = 0,
snd_limited = 0 '\0',
rexmt_high = 0,
snd_max_rexmt = 0,
scb = {
nblocks = 0,
sackblocks = {
tqh_first = 0x0,
tqh_last = 0xdd71a130
},
lostseq = 0,
lastfound = 0x0
},
reportblk = {
rblk_start = 0,
rblk_end = 0
},
encloseblk = {
rblk_start = 0,
rblk_end = 0
},
nsackhistory = 0,
sackhistory = {{
rblk_start = 0,
rblk_end = 0
}, {
rblk_start = 0,
rblk_end = 0
}, {
rblk_start = 0,
rblk_end = 0
}, {
rblk_start = 0,
rblk_end = 0
}},
t_outputq = {
tqe_next = 0x0,
tqe_prev = 0x0
},
snd_bandwidth = 0,
snd_bwnd = 1073725440,
t_bw_rtttime = 0,
t_bw_rtseq = 0
}
Updated by hsu over 18 years ago
:len ends up being 13 because tp>t_maxopd = 64, optlen = 12,
:and ipoptlen = 65 (because ipsec_hdrsiz_tcp(tp) = 65.
:
:But I don't know, how to handle this case correctly.
A maxopd value of 64 seems really small. What's the link MTU as
shown by ifconfig?
Updated by Johannes.Hofmann over 18 years ago
Jeffrey Hsu <hsu@leaf.dragonflybsd.org> wrote:
:len ends up being
13 because tp>t_maxopd = 64, optlen = 12,
:and ipoptlen = 65 (because ipsec_hdrsiz_tcp(tp) = 65.
:
:But I don't know, how to handle this case correctly.A maxopd value of 64 seems really small. What's the link MTU as
shown by ifconfig?
ipw0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1492
inet 192.168.1.5 netmask 0xffffff00 broadcast 192.168.1.255
ether 00:0c:f1:3b:d1:ef
media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps)
status: associated
Perhaps it's some negotiation with the remote host that reduces t_maxopd?
And the problem only occurs since I have upgarded my openwrt router...
Updated by dillon over 18 years ago
:ipw0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1492
: inet 192.168.1.5 netmask 0xffffff00 broadcast 192.168.1.255
: ether 00:0c:f1:3b:d1:ef
: media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps)
: status: associated
:
:
:Perhaps it's some negotiation with the remote host that reduces t_maxopd?
:And the problem only occurs since I have upgarded my openwrt router...
A tcpdump may give us a clue. Something like:
tcpdump -i ipw0 -s 4096 -l -vvv
would do the job, but you need to filter out data unrelated to the TCP
connection causing the problem. So, e.g. you might have to add
'not port 22 and not port <blah> and not port <blah>' or things of
that nature.
I have enclosed a patch that I believe will stop the crashes. The TCP
connection will still fail, because its an unrecoverable situation (the
MTU must be large enough to hold the packet's options!), but hopefully
the system will not crash. It should report problem packets on the
console/dmesg (limited to one report per second).
-Matt
Matthew Dillon
<dillon@backplane.com>
Index: netinet/tcp_output.c
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.30
diff u -r1.30 tcp_output.c netinet/tcp_output.c 14 Jan 2006 11:33:50 -0000 1.30
--
+++ netinet/tcp_output.c 9 Apr 2006 17:34:48 -0000@ -631,18 +631,29
@
#endif
/*
- * Adjust data length if insertion of options will
- * bump the packet length beyond the t_maxopd length.
- * Clear the FIN bit because we cut off the tail of
- * the segment.
+ * Adjust data length if insertion of options will bump the packet
+ * length beyond the t_maxopd length. Clear FIN to prevent premature
+ * closure since there is still more data to send after this (now
+ * truncated) packet.
+
+ * If just the options do not fit we are in a no-win situation and
+ * we treat it as an unreachable host.
*/
if (len + optlen + ipoptlen > tp->t_maxopd) {
- /
- * If there is still more to send, don't close the connection.
- */
- flags &= ~TH_FIN;
- len = tp->t_maxopd - optlen - ipoptlen;
- sendalot = TRUE;
+ if (tp->t_maxopd <= optlen + ipoptlen) {
+ static time_t last_optlen_report;
if (last_optlen_report != time_second) {
+ last_optlen_report = time_second;
+ printf("tcpcb p: MSS (%d) too small to hold options!\n", tp, tp->t_maxopd);
+ }
+ error = EHOSTUNREACH;
+ goto out;
+ } else {
+ flags x%x= ~TH_FIN;
+ len = tp->t_maxopd - optlen - ipoptlen;
+ sendalot = TRUE;
+ }
}
#ifdef INET6
Updated by Johannes.Hofmann over 18 years ago
Matthew Dillon <dillon@apollo.backplane.com> wrote:
:ipw0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1492
: inet 192.168.1.5 netmask 0xffffff00 broadcast 192.168.1.255
: ether 00:0c:f1:3b:d1:ef
: media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps)
: status: associated
:
:
:Perhaps it's some negotiation with the remote host that reduces t_maxopd?
:And the problem only occurs since I have upgarded my openwrt router...A tcpdump may give us a clue. Something like:
tcpdump -i ipw0 -s 4096 -l -vvv
would do the job, but you need to filter out data unrelated to the TCP
connection causing the problem. So, e.g. you might have to add
'not port 22 and not port <blah> and not port <blah>' or things of
that nature.
Unfortunately the traffic is encrypted by IPSEC, so there is not much,
one can see in the tcpdump output:
18:33:12.872804 IP (tos 0x10, ttl 64, id 3341, offset 0, flags [none], length: 104) blob > 192.168.1.1: ESP
18:33:12.876106 IP (tos 0x0, ttl 64, id 43795, offset 0, flags [none], length: 152) 192.168.1.1 > blob: ESP
18:33:12.971780 IP (tos 0x10, ttl 64, id 3343, offset 0, flags [none], length: 104) blob > 192.168.1.1: ESP
and so on...
I have enclosed a patch that I believe will stop the crashes. The TCP
connection will still fail, because its an unrecoverable situation (the
MTU must be large enough to hold the packet's options!), but hopefully
the system will not crash. It should report problem packets on the
console/dmesg (limited to one report per second).
The patch fixes the core dumps, thanks. I think the other thing is a
bug or misconfiguration in my router. I will try to track this down
later.
Johannes
PS: Sorry for the late reply. Weather and snow have been too good here :-)
-Matt
Matthew Dillon
<dillon@backplane.com>Index: netinet/tcp_output.c ===================================================================
RCS file: /cvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.30
diffu -r1.30 tcp_output.cnetinet/tcp_output.c 14 Jan 2006 11:33:50 -0000 1.30
--
+++ netinet/tcp_output.c 9 Apr 2006 17:34:48 -0000@ -631,18 +631,29
@
#endif/*
- * Adjust data length if insertion of options will
- * bump the packet length beyond the t_maxopd length.
- * Clear the FIN bit because we cut off the tail of
- * the segment.
+ * Adjust data length if insertion of options will bump the packet
+ * length beyond the t_maxopd length. Clear FIN to prevent premature
+ * closure since there is still more data to send after this (now
+ * truncated) packet.
+
+ * If just the options do not fit we are in a no-win situation and
+ * we treat it as an unreachable host.
/
if (len + optlen + ipoptlen > tp->t_maxopd) {
- /*
- * If there is still more to send, don't close the connection.
- */
- flags &= ~TH_FIN;
- len = tp->t_maxopd - optlen - ipoptlen;
- sendalot = TRUE;
+ if (tp->t_maxopd <= optlen + ipoptlen) {
+ static time_t last_optlen_report;
if (last_optlen_report != time_second) {
+ last_optlen_report = time_second;
+ printf("tcpcb p: MSS (%d) too small to hold options!\n", tp, tp->t_maxopd);
+ }
+ error = EHOSTUNREACH;
+ goto out;
+ } else {
+ flags x%x= ~TH_FIN;
+ len = tp->t_maxopd - optlen - ipoptlen;
+ sendalot = TRUE;
+ }
}#ifdef INET6