Project

General

Profile

Actions

Bug #1824

open

kernel panic, x86, 2.7.3.859.ge5104

Added by akirchhoff135014 about 11 years ago. Updated over 8 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

SMP enabled, otherwise a GENERIC kernel.

Full backtrace:

(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1, mplocked=0)
at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I was building some packages in the background at the time. I was ssh'ed in
from a linux box, and running x2x to allow me to move the mouse and keyboard
from the linux box to the DF box. At the time of the lock up, I was moving
the mouse around on the DF box. Though I have no idea if that's related to
the crash, that's the only thing that comes to mind.

Adam

Actions #1

Updated by akirchhoff135014 about 11 years ago

On Tuesday 07 September 2010 19:22:57 Adam K Kirchhoff wrote:

SMP enabled, otherwise a GENERIC kernel.

Full backtrace:

(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1,
mplocked=0) at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address
0x8 ) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I was building some packages in the background at the time. I was ssh'ed
in from a linux box, and running x2x to allow me to move the mouse and
keyboard from the linux box to the DF box. At the time of the lock up, I
was moving the mouse around on the DF box. Though I have no idea if
that's related to the crash, that's the only thing that comes to mind.

Adam

The kernel and core are in my ~/crash folder on leaf.

Adam

Actions #2

Updated by dillon about 11 years ago

:> I was building some packages in the background at the time. I was ssh'ed
:> in from a linux box, and running x2x to allow me to move the mouse and
:> keyboard from the linux box to the DF box. At the time of the lock up, I
:> was moving the mouse around on the DF box. Though I have no idea if
:> that's related to the crash, that's the only thing that comes to mind.
:>
:> Adam
:
:The kernel and core are in my ~/crash folder on leaf.
:
:Adam

Very interesting.  Somehow the so_snd sockbuf has fewer bytes worth
of mbufs than it says in its sb_cc field. I have no idea how it
managed to get into that state. We will have to keep an eye on
things and collect more information.
-Matt
Matthew Dillon
&lt;&gt;
Actions #3

Updated by dillon about 11 years ago

Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.

-Matt
Actions #4

Updated by akirchhoff135014 about 11 years ago

On Wed, 8 Sep 2010 10:12:00 -0700 (PDT)
Matthew Dillon <> wrote:

Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.

-Matt

The NFS server was running, but nothing was connected at the time.

Actions #5

Updated by dillon about 11 years ago

I've pushed a bunch of work, please update to the latest master
and continue testing!

No smoking gun but I suspect machine load may be causing m_reclaim()
to get run, which drains various protocol caches. Those caches were
not MPSAFE. The changes address that issue.
If these changes don't fix the problem then compile a fresh kernel
with two options added to your kernel config:
options         SOCKBUF_DEBUG
options MBUF_DEBUG
These are fairly invasive options so only compile them in if the
problems have not gone away. Hopefully then we will get a panic
closer to where the actual bad code is instead of well after the
fact.
-Matt
Actions #6

Updated by tuxillo over 8 years ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee deleted (0)

Hi Adam,

Did you have a chance to try what Matt said? Can you please provide feedback about it?

Thanks,
Antonio Huete

Actions

Also available in: Atom PDF