Bug #346

altq kernel panic with recent HEAD

Added by kylebutt over 7 years ago. Updated over 7 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Core and kernel forthcoming to my leaf account.
Happens upon loading pf with altq. Doesn't happen If I comment out
altq. It is repeatable. I can give my pf.conf if that would help.

Here's a backtrace:

(kgdb) bt
#0 dumpsys () at thread.h:83
#1 0xc02b5e54 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:355
#2 0xc02b65cd in panic (fmt=0xc0528e93 "from debugger") at /usr/src/sys/kern/kern_shutdown.c:757
#3 0xc016dbbe in db_panic (addr=-1068705345, have_addr=0, count=-1, modif=0xd6a2d578 "") at /usr/src/sys/ddb/db_command.c:447
#4 0xc016db53 in db_command (last_cmdp=0xc05e7290, cmd_table=0x0, aux_cmd_tablep=0xc0589bb4, aux_cmd_tablep_end=0xc0589bcc)
at /usr/src/sys/ddb/db_command.c:343
#5 0xc016dc33 in db_command_loop () at /usr/src/sys/ddb/db_command.c:469
#6 0xc01707b8 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_trap.c:71
#7 0xc04cd5eb in kdb_trap (type=3, code=0, regs=0xd6a2d69c) at /usr/src/sys/i386/i386/db_interface.c:148
#8 0xc04e3db7 in trap (frame=
{tf_fs = -910163944, tf_es = -1046806512, tf_ds = -694026224, tf_edi = 1, tf_esi = -1068340907, tf_ebp = -693971228, tf_isp = -693971256, tf_ebx = 256, tf_edx = -1072988160, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1068705345, tf_cs = 8, tf_eflags = 646, tf_esp = -1068184811, tf_ss = -1068265439}) at /usr/src/sys/i386/i386/trap.c:785
#9 0xc04ce80f in calltrap () at /usr/src/sys/i386/i386/exception.s:774
#10 0xc04cd9bf in Debugger (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:73
#11 0xc02b65c5 in panic (fmt=0xc0526955 "assertion: %s in %s") at /usr/src/sys/kern/kern_shutdown.c:755
#12 0xc0320389 in altq_enable (ifq=0xd500da84) at /usr/src/sys/net/altq/altq_subr.c:155
#13 0xd6d6fc1a in ?? ()
#14 0xd500da84 in ?? ()
#15 0xc0e5b0a4 in ?? ()
#16 0xc18ffd70 in ?? ()
#17 0xd6a2d858 in ?? ()
#18 0x00004000 in ?? ()
#19 0x00000003 in ?? ()
#20 0x0b726000 in ?? ()
#21 0x00000000 in ?? ()
#22 0xd6a2d7b8 in ?? ()
#23 0xc04499fa in ffs_getpages (ap=0xd6a2daa8) at ufs_readwrite.c:556
#24 0xc02a0752 in dev_dioctl (dev=0xc00b8000, cmd=18, data=0x12 <Address 0x12 out of bounds>, fflag=18, cred=0x12)
at /usr/src/sys/kern/kern_device.c:173
#25 0xc0310b66 in spec_ioctl (ap=0xc00b8000) at /usr/src/sys/vfs/specfs/spec_vnops.c:375
#26 0xc0310756 in spec_vnoperate (ap=0xc00b8000) at /usr/src/sys/vfs/specfs/spec_vnops.c:125
#27 0xc04512cb in ufs_vnoperatespec (ap=0xc00b8000) at /usr/src/sys/vfs/ufs/ufs_vnops.c:2460
#28 0xc030ba4e in vop_ioctl (ops=0xc00b8000, vp=0x12, command=18, data=0x12 <Address 0x12 out of bounds>, fflag=18, cred=0x12)
at /usr/src/sys/kern/vfs_vopops.c:372
#29 0xc030b5b6 in vn_ioctl (fp=0xc175ff40, com=536888362, data=0xd6a2dc14 "\v", ucred=0x12) at /usr/src/sys/kern/vfs_vnops.c:967
#30 0xc02d3d19 in mapped_ioctl (fd=3, com=536888362, uspc_data=0xb <Address 0xb out of bounds>, map=0x0) at file2.h:87
#31 0xc02d3867 in sys_ioctl (uap=0xc00b8000) at /usr/src/sys/kern/sys_generic.c:526
#32 0xc04e47f6 in syscall2 (frame=
{tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077937048, tf_esi = 2, tf_ebp = -1077937144, tf_isp = -693969548, tf_ebx = 3, tf_edx = 672542184, tf_ecx = -1077937216, tf_eax = 54, tf_trapno = 12, tf_err = 2, tf_eip = 672140808, tf_cs = 31, tf_eflags = 514, tf_esp = -1077937172, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1390
#33 0xc04ce89a in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:852
#34 0x0000001f in ?? ()
#35 0x0000002f in ?? ()
#36 0x00000000 in ?? ()
#37 0x00000000 in ?? ()
#38 0x00000000 in ?? ()
#39 0x00000000 in ?? ()
#40 0x18c96000 in ?? ()
#41 0xff800000 in ?? ()
#42 0xc9c06d00 in ?? ()
#43 0xd6a2dcfc in ?? ()
#44 0xd6a2dccc in ?? ()
#45 0xc060ccd4 in softclock_pcpu_ary ()
#46 0xc02bd098 in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:740
Previous frame inner to this frame (corrupt stack?)

Thanks for looking at it,

Kyle

History

#1 Updated by corecode over 7 years ago

Is this an smp system? which altq classifier/queue do you use?

i think the dump is needed to get some details.

cheers
simon

#2 Updated by kylebutt over 7 years ago

Yes. I have smp enabled on my ht pentium (not for performance, more
for smp testing).
>
> i think the dump is needed to get some details.

Available in ~kylebutt/crash on leaf

Kyle

#3 Updated by kylebutt over 7 years ago

sorry for the repeat, but It was class based queuing.
Here are the applicable sections of pf.conf:

# $OpenBSD: pf.conf,v 1.25 2004/01/29 18:54:29 todd Exp $
# $DragonFly: src/etc/pf.conf,v 1.1 2004/09/21 21:25:28 joerg Exp $
#
# See pf.conf(5) and /usr/share/pf for syntax and examples.

ext_if="em0"

#altq on $ext_if cbq bandwidth 256Kb queue {ssh, http, other}
#queue other bandwidth 10% cbq(default, borrow)
#queue http bandwidth 65% cbq(borrow)
#queue ssh bandwidth 25% {ssh_login, ssh_data}
#queue ssh_login bandwidth 25% priority 4 cbq(ecn)
#queue ssh_data bandwidth 75% cbq(ecn, borrow)

#pass out on $ext_if proto tcp to any port http modulate state queue http
#pass out on $ext_if proto tcp to any port ssh modulate state queue (ssh_data, ssh_login)

#4 Updated by dillon over 7 years ago

:Yes. I have smp enabled on my ht pentium (not for performance, more
:for smp testing).
:>
:> i think the dump is needed to get some details.
:
:Available in ~kylebutt/crash on leaf
:
:Kyle

There is still a mbuf in the IFQ and ifq_len is non-zero. The
altq_purge() call did not purge the IFQ.

In this case altq_purge() calls cbq_request() which calls cbq_purge().
Either cbq_purge() did not get called or it did not actually purge the
whole queue.

Here's a question for the PF masters.... when cbq_add_altq() is called
is it possible for the IFQ to already have packets queued to it?
Because if that is the case then those packets will never be placed
under CBQ management and can cause this situation to occur.

I have included a patch that should theoretically panic if CBQ is
associated with an IFQ that has packets in it already. Try this
and tell me if it panics here rather then later.

-Matt
Matthew Dillon
<>

Index: altq/altq_cbq.c
===================================================================
RCS file: /cvs/src/sys/net/altq/altq_cbq.c,v
retrieving revision 1.4
diff -u -r1.4 altq_cbq.c
--- altq/altq_cbq.c 5 Sep 2006 00:55:47 -0000 1.4
+++ altq/altq_cbq.c 11 Oct 2006 16:53:28 -0000
@@ -231,6 +231,7 @@
callout_init(&cbqp->cbq_callout);
cbqp->cbq_qlen = 0;
cbqp->ifnp.ifq_ = &ifp->if_snd; /* keep the ifq */
+ KKASSERT(cbqp->ifnp.ifq_->ifq_len == 0);

/* keep the state in pf_altq */
a->altq_disc = cbqp;

KGDB SESSION TRACE

(kgdb) print *ifq
$1 = {
ifq_head = 0xd6c74b00, <<<<<<<<<< MBUF STILL IN QUEUE
ifq_tail = 0xd6c74b00,
ifq_len = 1, <<<<<<<<<< COUNT OF 1 IS CORRECT
ifq_maxlen = 255,
ifq_drops = 0,
altq_type = 1,
altq_flags = 1,
altq_disc = 0xcce421c0,
altq_ifp = 0xd500d9c0,
altq_enqueue = 0xc031a2da <cbq_enqueue>,
altq_dequeue = 0xc031a425 <cbq_dequeue>,
altq_request = 0xc0319c09 <cbq_request>,
altq_clfier = 0x0,
altq_classify = 0,
altq_tbr = 0x0
}
(kgdb) frame
#12 0xc0320389 in altq_enable (ifq=0xd500da84)
at /usr/src/sys/net/altq/altq_subr.c:155
155 KKASSERT(ifq->ifq_len == 0);
(kgdb)

int
altq_enable(struct ifaltq *ifq)
{
if (!ifq_is_ready(ifq))
return ENXIO;
if (ifq_is_enabled(ifq))
return 0;

crit_enter();
ifq_purge(ifq);
KKASSERT(ifq->ifq_len == 0); <<<<<<<<<<<<<<<<<,
ifq->altq_flags |= ALTQF_ENABLED;
if (ifq->altq_clfier != NULL)
ifq->altq_flags |= ALTQF_CLASSIFY;
crit_exit();

return 0;
}

(kgdb) print *(cbq_state_t *)ifq->altq_disc
$3 = {
cbq_qlen = 0, <<<<<<<<<<<< COUNT OF 0 IS INCORRECT,
OR THE CBQ NEVER SAW
THE PACKET IN THE IFQ
cbq_class_tbl = {0x0, 0xc1894e60, 0xc1894f40, 0xc1895020, 0xc1895100,
0xc18951e0, 0xc18952c0, 0x0 <repeats 249 times>},
ifnp = {
queued_ = 0,
efficient_ = 0,
wrr_ = 1,
ns_per_byte_ = 31250,
maxqueued_ = 1,
maxpkt_ = 1500,
qi_ = 0,
qo_ = 0,
active_ = {0xc1894e60, 0xc1894f40, 0x0, 0x0, 0xc18951e0, 0x0, 0x0, 0x0},
na_ = {0, 0, 0, 0, 0, 0, 0, 0},
num_ = {1, 4, 0, 0, 1, 0, 0, 0},
alloc_ = {32000, 38000, 0, 0, 2000, 0, 0, 0},
M_ = {21, 6, 0, 0, 1, 0, 0, 0},
ifq_ = 0xd500da84,
default_ = 0xc1894f40,
root_ = 0xc1894e60,
ctl_ = 0x0,
restart = 0xc031a4ec <cbqrestart>,
borrowed_ = {0x0},
class_ = {0x0},
curlen_ = {0},
now_ = {{
tv_sec = 0,
tv_usec = 0
}},
is_overlimit_ = {0},
cutoff_ = 32,
ifnow_ = {
tv_sec = 0,
tv_usec = 0
},
maxiftime_ = 187500,
pollcache_ = 0x0
},
cbq_callout = {
c_links = {
sle = {
sle_next = 0x0
},
tqe = {
tqe_next = 0x0,
tqe_prev = 0x0
}
},
c_time = 0,
c_arg = 0x0,
c_func = 0,
c_flags = 16,
c_gd = 0x0
}
}
(kgdb)

#5 Updated by kylebutt over 7 years ago

Yes, It crashes right there with the patch applied. I can upload core/kernel
if you'd like.

Kyle.

#6 Updated by dillon over 7 years ago

:
:
:Kyle Butt <> added the comment:
:
:Yes, It crashes right there with the patch applied. I can upload core/kernel
:if you'd like.
:
:Kyle.

No, don't bother, that's proof enough. I think the easiest thing to do
is to have the cbq and other queue types clean out the altq before
attaching. I'll work up a patch this evening.

-Matt
Matthew Dillon
<>

#7 Updated by dillon over 7 years ago

Ok Kyle, remove the old patch and please try this patch. If it works
I will add the ifq_purge() to all the ALTQ classes.

-Matt
Matthew Dillon
<>

Index: altq/altq_cbq.c
===================================================================
RCS file: /cvs/src/sys/net/altq/altq_cbq.c,v
retrieving revision 1.4
diff -u -r1.4 altq_cbq.c
--- altq/altq_cbq.c 5 Sep 2006 00:55:47 -0000 1.4
+++ altq/altq_cbq.c 13 Oct 2006 16:31:49 -0000
@@ -231,6 +231,7 @@
callout_init(&cbqp->cbq_callout);
cbqp->cbq_qlen = 0;
cbqp->ifnp.ifq_ = &ifp->if_snd; /* keep the ifq */
+ ifq_purge(&ifp->if_snd);

/* keep the state in pf_altq */
a->altq_disc = cbqp;

#8 Updated by corecode over 7 years ago

is this fixed by the patch?

#9 Updated by kylebutt over 7 years ago

Yeah, that fixed it.

Kyle

Also available in: Atom PDF