Bug #1824
openkernel panic, x86, 2.7.3.859.ge5104
0%
Description
SMP enabled, otherwise a GENERIC kernel.
Full backtrace:
(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1, mplocked=0)
at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I was building some packages in the background at the time. I was ssh'ed in
from a linux box, and running x2x to allow me to move the mouse and keyboard
from the linux box to the DF box. At the time of the lock up, I was moving
the mouse around on the DF box. Though I have no idea if that's related to
the crash, that's the only thing that comes to mind.
Adam
Updated by akirchhoff135014 over 14 years ago
On Tuesday 07 September 2010 19:22:57 Adam K Kirchhoff wrote:
SMP enabled, otherwise a GENERIC kernel.
Full backtrace:
(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1,
mplocked=0) at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address
0x8 ) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)I was building some packages in the background at the time. I was ssh'ed
in from a linux box, and running x2x to allow me to move the mouse and
keyboard from the linux box to the DF box. At the time of the lock up, I
was moving the mouse around on the DF box. Though I have no idea if
that's related to the crash, that's the only thing that comes to mind.Adam
The kernel and core are in my ~/crash folder on leaf.
Adam
Updated by dillon over 14 years ago
:> I was building some packages in the background at the time. I was ssh'ed
:> in from a linux box, and running x2x to allow me to move the mouse and
:> keyboard from the linux box to the DF box. At the time of the lock up, I
:> was moving the mouse around on the DF box. Though I have no idea if
:> that's related to the crash, that's the only thing that comes to mind.
:>
:> Adam
:
:The kernel and core are in my ~/crash folder on leaf.
:
:Adam
Very interesting. Somehow the so_snd sockbuf has fewer bytes worth
of mbufs than it says in its sb_cc field. I have no idea how it
managed to get into that state. We will have to keep an eye on
things and collect more information.
-Matt
Matthew Dillon
<dillon@backplane.com>
Updated by dillon over 14 years ago
Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.
-Matt
Updated by akirchhoff135014 over 14 years ago
On Wed, 8 Sep 2010 10:12:00 -0700 (PDT)
Matthew Dillon <dillon@apollo.backplane.com> wrote:
Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.-Matt
The NFS server was running, but nothing was connected at the time.
Updated by dillon over 14 years ago
I've pushed a bunch of work, please update to the latest master
and continue testing!
No smoking gun but I suspect machine load may be causing m_reclaim()
to get run, which drains various protocol caches. Those caches were
not MPSAFE. The changes address that issue.
If these changes don't fix the problem then compile a fresh kernel
with two options added to your kernel config:
options SOCKBUF_DEBUG
options MBUF_DEBUG
These are fairly invasive options so only compile them in if the
problems have not gone away. Hopefully then we will get a panic
closer to where the actual bad code is instead of well after the
fact.
-Matt
Updated by tuxillo almost 12 years ago
- Description updated (diff)
- Status changed from New to Feedback
- Assignee deleted (
0)
Hi Adam,
Did you have a chance to try what Matt said? Can you please provide feedback about it?
Thanks,
Antonio Huete