Bug #1824

kernel panic, x86, 2.7.3.859.ge5104

Added by akirchhoff135014 over 4 years ago. Updated almost 2 years ago.

Status:FeedbackStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

SMP enabled, otherwise a GENERIC kernel.

Full backtrace:

(kgdb) bt
#0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
#1 md_dumpsys (di=0xc06ebee0) at
/usr/src/sys/platform/pc32/i386/dump_machdep.c:263
#2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
#3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
#4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
chain")
at /usr/src/sys/kern/kern_shutdown.c:786
#5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
/usr/src/sys/kern/uipc_mbuf.c:1113
#6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
/usr/src/sys/netinet/tcp_output.c:723
#7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
m=0xe20ca900, nam=0x0, control=0x0,
td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
#8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
/usr/src/sys/kern/uipc_msg.c:548
#9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1, mplocked=0)
at /usr/src/sys/net/netisr.c:310
#10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
/usr/src/sys/net/netisr.c:357
#11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:278
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I was building some packages in the background at the time. I was ssh'ed in
from a linux box, and running x2x to allow me to move the mouse and keyboard
from the linux box to the DF box. At the time of the lock up, I was moving
the mouse around on the DF box. Though I have no idea if that's related to
the crash, that's the only thing that comes to mind.

Adam

History

#1 Updated by akirchhoff135014 over 4 years ago

On Tuesday 07 September 2010 19:22:57 Adam K Kirchhoff wrote:
> SMP enabled, otherwise a GENERIC kernel.
>
> Full backtrace:
>
> (kgdb) bt
> #0 _get_mycpu (di=0xc06ebee0) at ./machine/thread.h:83
> #1 md_dumpsys (di=0xc06ebee0) at
> /usr/src/sys/platform/pc32/i386/dump_machdep.c:263
> #2 0xc0315d31 in dumpsys () at /usr/src/sys/kern/kern_shutdown.c:880
> #3 0xc03162f1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:387
> #4 0xc03165ba in panic (fmt=0xc06122c4 "m_copym, length > size of mbuf
> chain")
> at /usr/src/sys/kern/kern_shutdown.c:786
> #5 0xc034f662 in m_copym (m=0x0, off0=32, len=24, wait=4) at
> /usr/src/sys/kern/uipc_mbuf.c:1113
> #6 0xc03ec2e6 in tcp_output (tp=0xdcc96d88) at
> /usr/src/sys/netinet/tcp_output.c:723
> #7 0xc03f38da in tcp_usr_send (so=0xc6a73698, flags=<value optimized out>,
> m=0xe20ca900, nam=0x0, control=0x0,
> td=0xdcc96988) at /usr/src/sys/netinet/tcp_usrreq.c:762
> #8 0xc03518a4 in netmsg_pru_send (msg=0xe5c15b5c) at
> /usr/src/sys/kern/uipc_msg.c:548
> #9 0xc03a4674 in netmsg_service (msg=0xe5c15b5c, mpsafe_mode=1,
> mplocked=0) at /usr/src/sys/net/netisr.c:310
> #10 0xc03a478a in netmsg_service_loop (arg=0xc068cda0) at
> /usr/src/sys/net/netisr.c:357
> #11 0xc031f2ae in lwkt_deschedule_self (td=Cannot access memory at address
> 0x8 ) at /usr/src/sys/kern/lwkt_thread.c:278
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> I was building some packages in the background at the time. I was ssh'ed
> in from a linux box, and running x2x to allow me to move the mouse and
> keyboard from the linux box to the DF box. At the time of the lock up, I
> was moving the mouse around on the DF box. Though I have no idea if
> that's related to the crash, that's the only thing that comes to mind.
>
> Adam

The kernel and core are in my ~/crash folder on leaf.

Adam

#2 Updated by dillon over 4 years ago

:> I was building some packages in the background at the time. I was ssh'ed
:> in from a linux box, and running x2x to allow me to move the mouse and
:> keyboard from the linux box to the DF box. At the time of the lock up, I
:> was moving the mouse around on the DF box. Though I have no idea if
:> that's related to the crash, that's the only thing that comes to mind.
:>
:> Adam
:
:The kernel and core are in my ~/crash folder on leaf.
:
:Adam

Very interesting. Somehow the so_snd sockbuf has fewer bytes worth
of mbufs than it says in its sb_cc field. I have no idea how it
managed to get into that state. We will have to keep an eye on
things and collect more information.

-Matt
Matthew Dillon
<>

#3 Updated by dillon over 4 years ago

Adam, was the machine that crashed serving NFS ? I was able to
reproduce the exact same crash while serving NFS.

-Matt

#4 Updated by akirchhoff135014 over 4 years ago

On Wed, 8 Sep 2010 10:12:00 -0700 (PDT)
Matthew Dillon <> wrote:

> Adam, was the machine that crashed serving NFS ? I was able to
> reproduce the exact same crash while serving NFS.
>
> -Matt

The NFS server was running, but nothing was connected at the time.

#5 Updated by dillon over 4 years ago

I've pushed a bunch of work, please update to the latest master
and continue testing!

No smoking gun but I suspect machine load may be causing m_reclaim()
to get run, which drains various protocol caches. Those caches were
not MPSAFE. The changes address that issue.

If these changes don't fix the problem then compile a fresh kernel
with two options added to your kernel config:

options SOCKBUF_DEBUG
options MBUF_DEBUG

These are fairly invasive options so only compile them in if the
problems have not gone away. Hopefully then we will get a panic
closer to where the actual bad code is instead of well after the
fact.

-Matt

#6 Updated by tuxillo almost 2 years ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee deleted (0)

Hi Adam,

Did you have a chance to try what Matt said? Can you please provide feedback about it?

Thanks,
Antonio Huete

Also available in: Atom PDF