https://bugs.dragonflybsd.org/https://bugs.dragonflybsd.org/favicon.ico?16293952082007-04-11T13:04:56ZDragonFlyBSD bugtrackerDragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=24862007-04-11T13:04:56Zpavalos
<ul></ul><p>I forgot to mention that the kernel and cores are *.3 and *.4.</p> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=24942007-04-12T01:43:01Zdillon
<ul></ul><p>:New submission from Peter Avalos <<a class="email" href="mailto:pavalos@theshell.com">pavalos@theshell.com</a>>:<br />:<br />:Here's a panic I'm getting with some pretty serious network (www) load, then <br />:doing a netstat -an:<br />:<br />:Unread portion of the kernel message buffer:<br />:panic: m_copydata, negative off -1<br />:mp_lock = 00000000; cpuid = 0; lapic.id = 00000000<br />:boot() called on cpu#0<br />:<br />:syncing disks... 5<br />:done<br />:Uptime: 12d22h0m32s</p>
<pre><code>Woa. You mean the panic occurs only when you do the netstat -an command<br /> under heavy network load? It doesn't happen any other time?</code></pre>
<pre><code>This is a really odd crash. Somehow tp->snd_nxt has become less<br /> then tp->snd_una, causing 'off' to be calculated as -1.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=24972007-04-12T05:24:01Zpavalos
<ul></ul><p>Correct. Once I execute "netstat -an" it panics. Any ideas?</p>
<p>--Peter</p> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=24982007-04-12T05:56:00Zdillon
<ul></ul><p>:On Wed, Apr 11, 2007 at 11:38:48AM -0700, Matthew Dillon wrote:<br />:>=20<br />:> Woa. You mean the panic occurs only when you do the netstat -an comm=<br />:and<br />:> under heavy network load? It doesn't happen any other time?<br />:>=20<br />:<br />:Correct. Once I execute "netstat -an" it panics. Any ideas?<br />:<br />:--Peter</p>
<pre><code>That's really odd. I looked at the code and found one possible<br /> place where the field could get out of whack. Try turning off the<br /> tcp limited transmit code:</code></pre>
<pre><code>(in /etc/sysctl.conf):<br /> net.inet.tcp.limitedtransmit=0</code></pre>
<pre><code>and reboot to clean out any preexisting tcp connections (or otherwise<br /> clean them out manually by killing and restarting the services).</code></pre>
<pre><code>I'm a bit at a loss a to why netstat -an would trigger the problem,<br /> though. We do know that anything that accesses /dev/kmem heavily,<br /> like fstat, can crash the machine while chasing down stale pointers<br /> in kernel memory. But this panic seems a bit at odds with the sort<br /> of crash I would expect from stale pointer chasing.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=25082007-04-12T12:57:01Zjoerg
<ul></ul><p>netstat -an uses a sysctl interface though.</p>
<p>Joerg</p> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=25112007-04-13T00:19:01Zdillon
<ul></ul><p>:On Wed, Apr 11, 2007 at 03:52:30PM -0700, Matthew Dillon wrote:<br />:> I'm a bit at a loss a to why netstat -an would trigger the problem,<br />:> though. We do know that anything that accesses /dev/kmem heavily,<br />:> like fstat, can crash the machine while chasing down stale pointers<br />:> in kernel memory. But this panic seems a bit at odds with the sort<br />:> of crash I would expect from stale pointer chasing.<br />:<br />:netstat -an uses a sysctl interface though.<br />:<br />:Joerg</p>
<pre><code>That would make more sense. I was scratching my head at how a KVM<br /> access could cause this, a direct sysctl interface is more likely.</code></pre>
<pre><code>I don't see a whole lot in the sysctl code either, unfortunately.<br /> e.g. tcp_pcblist() in tcp_subr.c. There is one likely possibility.<br /> Because the sysctl is dumping its huge, huge list in one large go<br /> and holding the big giant lock while it does it, it could be<br /> preventing the TCP stack's callout's (which is where the panic occured)<br /> from running during that period. There could be a race condition there<br /> that we are not handling properly.</code></pre>
<pre><code>so, e.g. some sort of race in softclock_handler() in kern_timeout.c<br /> related to the acquisition of the big giant lock.</code></pre>
<pre><code>-Matt<br /> Matthew Dillon <br /> &lt;<a class="email" href="mailto:dillon@backplane.com">dillon@backplane.com</a>&gt;</code></pre> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=25172007-04-17T00:42:02Zdillon
<ul></ul><p>Any luck turning off that sysctl?</p>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=25182007-04-17T05:50:01Zpavalos
<ul></ul><p>I have turned off the sysctl, but I'm having a hard time generating the<br />traffic needed that originally caused the panic. (I'm estimating it was<br />about 6000 simultaneous http connections, but I'm not exactly sure since<br />netstat wasn't working.)</p>
<p>--Peter</p> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=25192007-04-18T00:33:01Zdillon
<ul></ul><p>:traffic needed that originally caused the panic. (I'm estimating it was<br />:about 6000 simultaneous http connections, but I'm not exactly sure since<br />:netstat wasn't working.)<br />:<br />:--Peter</p>
<pre><code>Ok, for now please continue with the sysctl turned off (put it in your<br /> /etc/sysctl.conf). That way if the crash occurs again in the future<br /> we can discount that part of the tcp stack.</code></pre>
<pre><code>I am going to go ahead and commit a fix to the possible bug related<br /> to the code in question.</code></pre>
<pre><code>-Matt<br /> Matthew Dillon <br /> &lt;<a class="email" href="mailto:dillon@backplane.com">dillon@backplane.com</a>&gt;</code></pre> DragonFlyBSD - Bug #599: 1.9.0 reproducable panichttps://bugs.dragonflybsd.org/issues/599?journal_id=94262010-12-22T09:08:24Zpavalos
<ul></ul><p>net.inet.tcp.limitedtransmit=1 definitely causes instability, even on a latest <br />master:</p>
<p>DragonFly ylem.theshell.com 2.9-DEVELOPMENT DragonFly v2.9.1.321.gadb6af-<br />DEVELOPMENT <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: panic (Closed)" href="https://bugs.dragonflybsd.org/issues/31">#31</a>: Tue Dec 21 13:12:36 HST 2010 <br /><a class="email" href="mailto:root@ylem.theshell.com">root@ylem.theshell.com</a>:/usr/obj/usr/src/sys/YLEM i386</p>
<p>Unfortunately I keep getting secondary panics, so I'm not able to get a vmcore.</p>