DragonFly BSD
 

Issue957

Title CARP panic
Priority Status chatting
Superseder Nosy List elekktretterr
Assigned To Topics

Created on 2008-03-02.23:02:02 by elekktretterr, last changed 2008-07-24.16:41:02 by nant.

Files
File name Uploaded Type Edit Remove
ethersubr.txt sepherosa, 2008-07-24.02:07:13 text/plain
panic elekktretterr, 2008-07-24.01:10:04 image/jpeg
Messages
msg5029 (view) Author: nant Date: 2008-07-24.16:41:01
It applies cleanly on my 2.0 checkout tree.

[tahngarth] /Development/DragonFly-2.0/src/sys/net> patch -C <
/tmp/ethersubr.txt
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: if_ethersubr.c
|===================================================================
|RCS file: /dcvs/src/sys/net/if_ethersubr.c,v
|retrieving revision 1.76
|diff -u -p -r1.76 if_ethersubr.c
|--- if_ethersubr.c     8 Jul 2008 13:50:52 -0000       1.76
|+++ if_ethersubr.c     24 Jul 2008 01:56:50 -0000
--------------------------
Patching file if_ethersubr.c using Plan A...
Hunk #1 succeeded at 1658.
Hunk #2 succeeded at 1668.
done
msg5028 (view) Author: sepherosa Date: 2008-07-24.16:40:01
There is almost nothing related to networking changed after 2.0
branching.  No matter what, I suggest you to rebuild world and kernel.
 If it still does not work, please post the dmesg.

Best Regards,
sephe
msg5027 (view) Author: elekktretterr Date: 2008-07-24.16:27:01
Shoudnt the patch apply cleanly against 2.0 though which is what my cvs
tree is?
msg5023 (view) Author: sepherosa Date: 2008-07-24.11:29:03
I don't believe that the patch has anything to do with the missing
interfaces.  I think before you did the patch your tree was at point
before MPLS was brought in, so the patch was applied uncleanly.  I
suspect your kernel is out of sync with your world.  Please rebuild
world and kernel.

Yep, that's what after patching.

Best Regards,
sephe
msg5022 (view) Author: elekktretterr Date: 2008-07-24.10:59:00
Hey Sephe,
I had to apply the patch manually as it failed to apply cleanly(on DF
2.0-RELEASE). But unfortunately the patch does more wrong than good. All
the network interfaces could not be worked with. ie. ifconfig shows that
they exist but if i did something like ifconfig bge0 or ifconfig carp0 it
would simply say bge0 doesnt exist, and thus there were no ip addresses
associated to the interfaces.

The is the patched section of the file:

static __inline struct lwkt_port *
ether_mport(int num, struct mbuf **m)
{
        if (num == NETISR_MAX) {
                /*
                 * All packets whose target msgports can't be
                 * determined here are dispatched to netisr0,
                 * where further dispatching may happen.
                 */
                return cpu_portfn(0);
        }

        return netisr_find_port(num, m);
}

Did i miss something while patching it manually?

Cheers,
Petr
msg5015 (view) Author: sepherosa Date: 2008-07-24.02:07:33
On Thu, Jul 24, 2008 at 9:05 AM, Petr Janda
<elekktretterr@exemail.com.au> wrote:
> Hi Sephe,
> Ive just got a panic on the mailserver. saying:
>
> panic: assertion, m == NULL in ether_input_chain2
>
> Attached is an image of the panic.
> Can you fix it? Thanks a lot.

Try the attached patch.

Best Regards,
sephe
msg5014 (view) Author: elekktretterr Date: 2008-07-24.01:10:04
Hi Sephe,
Ive just got a panic on the mailserver. saying:

panic: assertion, m == NULL in ether_input_chain2

Attached is an image of the panic.
Can you fix it? Thanks a lot.

Petr
msg4922 (view) Author: elekktretterr Date: 2008-07-11.02:14:02
Hey,
Can we at least make it an option? if ETHER_INPUT2 is defined use patched 
CARP, if not use the old. I know, a bit of a hack but only temporary.

Cheers,
Petr
msg4921 (view) Author: sepherosa Date: 2008-07-11.01:28:03
Hi,

Using ETHER_INPUT2 is tied with the CARP lock flag changing.  Using
ETHER_INPUT2 by default is a little bit risky for this release and
ipflow can't be worked out in compat fashion.  You can just keep the
patch locally; if you have other NICs that you want to use it with
CARP, please just tell me.  Currently following NICs has ETHER_INPUT2
support (so I could test that code path from day to day before release
:)
bge(4) bce(4) em(4) et(4) msk(4) nfe(4) re(4), bfe(4) fxp(4) xl(4)

Once release is done, I will switch to ETHER_INPUT2 in repo; along
with the CARP lock flag changing of course.

Best Regards,
sephe
msg4920 (view) Author: elekktretterr Date: 2008-07-11.00:20:03
Sephe,
Can you please commit the fix?

Cheers,
Petr
msg4909 (view) Author: elekktretterr Date: 2008-07-10.03:14:02
Good news,
so far no panic after 3 days. Please commit a fix to HEAD.

Petr
msg4885 (view) Author: elekktretterr Date: 2008-07-08.10:39:02
Hey guys,
so far no panic(2 hours and going) with the patch.
Couple of things though:

1) carp inteface cannot be properly configured if net.inet.preempt=1 is
run before the the carp network inteface interface is up. it gives an
error saying: arp_rtrequest: bad gateway value or something like that. and
thus cannot be put into sysctl.conf

if run after the carp/physical inteface is up and configured it works fine

2) maildirs are stored and read from an nfs share. if i unplug the cable
to cause master to go down. commands run on the command (ie. ls
/usr/local/nfs_share) line seem to freeze untill i plug the cable in
again.

Cheers,
petr
msg4824 (view) Author: dillon Date: 2008-07-03.03:36:00
:> Hey there,
:>
:>  Can someone please do something about the CARP panic on SMP boxes before the
:>  release? Ive got a box pending to go in production, but i cant because of the
:>  panic.
:>
:>  Thanks so much to anyone who can do it.
:
:Since you are using bge(4), I suggest you do following things:
:- Add options ETHER_INPUT_CHAIN and options ETHER_INPUT2 in your
:kernel config file
:- Change line 2119 in netinet/ip_carp.c
:  from:
:  #define CARP_LOCK_INIT(cif) lockinit(&(cif)->vhif_lock, "carp_if",
:0,  LK_NOWAIT);
:  to:
:  #define CARP_LOCK_INIT(cif) lockinit(&(cif)->vhif_lock, "carp_if", 0,  0);
:  i.e. strip the LK_NOWAIT
:
:Hope it will work for you.
:
:Best Regards,
:sephe

    If that doesn't work post another traceback and I'll try to track
    it down.  It kinda looks like one thread is locking and another is
    unlocking, which isn't legal without setting the lock holder to a
    special value.  But it could also have been due to that LK_NOWAIT.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4823 (view) Author: sepherosa Date: 2008-07-03.03:30:01
Since you are using bge(4), I suggest you do following things:
- Add options ETHER_INPUT_CHAIN and options ETHER_INPUT2 in your
kernel config file
- Change line 2119 in netinet/ip_carp.c
  from:
  #define CARP_LOCK_INIT(cif) lockinit(&(cif)->vhif_lock, "carp_if",
0,  LK_NOWAIT);
  to:
  #define CARP_LOCK_INIT(cif) lockinit(&(cif)->vhif_lock, "carp_if", 0,  0);
  i.e. strip the LK_NOWAIT

Hope it will work for you.

Best Regards,
sephe
msg4822 (view) Author: elekktretterr Date: 2008-07-03.03:10:02
Hey there,

Can someone please do something about the CARP panic on SMP boxes before the 
release? Ive got a box pending to go in production, but i cant because of the 
panic.

Thanks so much to anyone who can do it.
Petr
msg4394 (view) Author: elekktretterr Date: 2008-03-17.06:12:01
It would be great if someone fixed this at some point, as at the moment CARP 
is basically useless.

Petr
msg4334 (view) Author: TGEN Date: 2008-03-03.15:31:02
I'd prefer CARP to grow EBUSY handling rather than changing our locking 
primitives...
-- 
         Thomas E. Spanjaard
         tgen@netphreax.net
msg4333 (view) Author: elekktretterr Date: 2008-03-02.23:54:00
What are the implications of that?

Petr
msg4332 (view) Author: dillon Date: 2008-03-02.23:37:01
:This box is the MASTER in the CARP domain and it paniced today:
:
:http://www.punchyouremployer.com/files/carp_panic.JPG
:
:Petr

    Well, I see a problem immediately.  The lockmgr macros in ip_carp.c
    are specifying non-blocking locks but all the calls to them assume
    success.

    I'm not sure what the best solution is here.  The lockmgr is not supposed
    to be called from an interrupt, which is why I think it was coded
    LK_NOWAIT in the carp port, but the carp code doesn't check for an
    EBUSY return value.  That's a real problem.

    We can't use spin locks, there are blockable paths called with the
    CARP_LOCK held.   It looks like our only real choice is to allow
    lockmgr locks from interrupts... which by the way precludes being able
    to use a FAST interrupt for network interrupts (which we don't at the
    moment anyway, but...).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
msg4331 (view) Author: elekktretterr Date: 2008-03-02.23:02:00
by the way this is an SMP box.

Petr
History
Date User Action Args
2008-07-24 16:41:02nantsetmessages: + msg5029
2008-07-24 16:40:03sepherosasetmessages: + msg5028
2008-07-24 16:27:01elekktretterrsetmessages: + msg5027
2008-07-24 11:29:06sepherosasetmessages: + msg5023
2008-07-24 10:59:02elekktretterrsetmessages: + msg5022
2008-07-24 02:07:35sepherosasetfiles: + ethersubr.txt
messages: + msg5015
2008-07-24 01:10:07elekktretterrsetfiles: + panic
messages: + msg5014
2008-07-11 02:14:06elekktretterrsetmessages: + msg4922
2008-07-11 01:28:08sepherosasetmessages: + msg4921
2008-07-11 00:20:10elekktretterrsetmessages: + msg4920
title: CARP Panic -> CARP panic
2008-07-10 03:14:06elekktretterrsetmessages: + msg4909
title: CARP panic -> CARP Panic
2008-07-08 10:39:08elekktretterrsetmessages: + msg4885
2008-07-03 03:36:02dillonsetmessages: + msg4824
2008-07-03 03:30:02sepherosasetmessages: + msg4823
2008-07-03 03:10:06elekktretterrsetmessages: + msg4822
2008-04-21 19:09:58hassolinkissue956 superseder
2008-03-17 06:12:02elekktretterrsetmessages: + msg4394
2008-03-03 15:31:04TGENsetmessages: + msg4334
2008-03-02 23:54:00elekktretterrsetmessages: + msg4333
2008-03-02 23:37:04dillonsetstatus: unread -> chatting
messages: + msg4332
2008-03-02 23:02:02elekktretterrcreate