Bug #1824
open
kernel panic, x86, 2.7.3.859.ge5104
Added by akirchhoff135014 about 14 years ago.
Updated over 11 years ago.
Description
SMP enabled, otherwise a GENERIC kernel.
Full backtrace:
(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1, mplocked=0)
at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I was building some packages in the background at the time. I was ssh'ed in
from a linux box, and running x2x to allow me to move the mouse and keyboard
from the linux box to the DF box. At the time of the lock up, I was moving
the mouse around on the DF box. Though I have no idea if that's related to
the crash, that's the only thing that comes to mind.
Adam
On Tuesday 07 September 2010 19:22:57 Adam K Kirchhoff wrote:
SMP enabled, otherwise a GENERIC kernel.
Full backtrace:
(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1,
mplocked=0) at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address
0x8 ) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I was building some packages in the background at the time. I was ssh'ed
in from a linux box, and running x2x to allow me to move the mouse and
keyboard from the linux box to the DF box. At the time of the lock up, I
was moving the mouse around on the DF box. Though I have no idea if
that's related to the crash, that's the only thing that comes to mind.
Adam
The kernel and core are in my ~/crash folder on leaf.
Adam
:> I was building some packages in the background at the time. I was ssh'ed
:> in from a linux box, and running x2x to allow me to move the mouse and
:> keyboard from the linux box to the DF box. At the time of the lock up, I
:> was moving the mouse around on the DF box. Though I have no idea if
:> that's related to the crash, that's the only thing that comes to mind.
:>
:> Adam
:
:The kernel and core are in my ~/crash folder on leaf.
:
:Adam
Very interesting. Somehow the so_snd sockbuf has fewer bytes worth
of mbufs than it says in its sb_cc field. I have no idea how it
managed to get into that state. We will have to keep an eye on
things and collect more information.
-Matt
Matthew Dillon
<dillon@backplane.com>
Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.
-Matt
On Wed, 8 Sep 2010 10:12:00 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:
Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.
-Matt
The NFS server was running, but nothing was connected at the time.
I've pushed a bunch of work, please update to the latest master
and continue testing!
No smoking gun but I suspect machine load may be causing m_reclaim()
to get run, which drains various protocol caches. Those caches were
not MPSAFE. The changes address that issue.
If these changes don't fix the problem then compile a fresh kernel
with two options added to your kernel config:
options SOCKBUF_DEBUG
options MBUF_DEBUG
These are fairly invasive options so only compile them in if the
problems have not gone away. Hopefully then we will get a panic
closer to where the actual bad code is instead of well after the
fact.
-Matt
- Description updated (diff)
- Status changed from New to Feedback
- Assignee deleted (
0)
Hi Adam,
Did you have a chance to try what Matt said? Can you please provide feedback about it?
Thanks,
Antonio Huete
Also available in: Atom
PDF