Bug #2458

ath: ran out of descriptors

Added by Johannes.Hofmann over 1 year ago. Updated about 1 year ago.

Status:ResolvedStart date:11/18/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

After running for a while in a WPA secured WLAN with an ath(4) card, I get:
ath: ran out of descriptors
and the network stops working.
Sometimes destroying wlan0 and kldunload if_ath; kldload if_ath
restores network, but in some cases I get

Nov 18 12:45:06 plop kernel: Trace beginning at frame 0xd6264ca8
Nov 18 12:45:06 plop kernel: m_free(ffffffff,d47f1c00,0,d936dc00,d5a94400) at m_free+0xc6 0x
c01f26ed
Nov 18 12:45:06 plop kernel: m_free(d47f7800,d6264cfc,c09c8846,d47f7800,c47576f0) at m_free+
0xc6 0xc01f26ed
Nov 18 12:45:06 plop kernel: m_freem(d47f7800,c47576f0,d6264cfc,c09e8e58,d5cf9f80) at m_free
m+0x15 0xc01f29c6
Nov 18 12:45:06 plop kernel: ieee80211_flush_ifq(d5a94400,d4c46500,ffffffff,d5e35478,c47576f
0) at ieee80211_flush_ifq+0x70 0xc09c8846
Nov 18 12:45:06 plop kernel: ieee80211_newstate_task(d4c46500,1,c47576f4,1,c475770c) at ieee
80211_newstate_task+0x231 0xc09e74e3
Nov 18 12:45:06 plop kernel: taskqueue_run(c47576f0,c475770c,0,c03d27b3,0) at taskqueue_run+
0x93 0xc01dd304
Nov 18 12:45:06 plop kernel: taskqueue_thread_loop(d5e3544c,0,0,0,0) at taskqueue_thread_loo
p+0x4b 0xc01dd5ed
Nov 18 12:45:06 plop kernel: lwkt_exit() at lwkt_exit 0xc01bbab3

and the system freezes on kldunload if_ath

ath_lock.diff Magnifier (2.3 KB) Johannes.Hofmann, 11/22/2012 01:27 PM

History

#1 Updated by Johannes.Hofmann over 1 year ago

So far I wasn't able to reproduce it with SMP disabled in the BIOS, so it could be SMP related.

#2 Updated by Johannes.Hofmann over 1 year ago

With some wlan_assert_serialized() added I found that ath_start is called without serializer held via ieee80211_handoff->ifq_dispatch->ath_start.

With following changes I didn't see the problem so far:

diff --git a/sys/dev/netif/ath/ath/if_ath.c b/sys/dev/netif/ath/ath/if_ath.c
index 2f22372..9b1ca66 100644
--- a/sys/dev/netif/ath/ath/if_ath.c
+++ b/sys/dev/netif/ath/ath/if_ath.c
@@ -452,9 +452,7 @@ ath_attach(u_int16_t devid, struct ath_softc *sc)
* Allocate tx+rx descriptors and populate the lists.
*/
wlan_assert_serialized();
- wlan_serialize_exit();
error = ath_desc_alloc(sc);
- wlan_serialize_enter();
if (error != 0) {
if_printf(ifp, "failed to allocate descriptors: %d\n", error);
goto bad;
diff --git a/sys/netproto/802_11/wlan/ieee80211_dragonfly.c b/sys/netproto/802_11/wlan/ieee80211_dragonfly.c
index 098598e..302a189 100644
--- a/sys/netproto/802_11/wlan/ieee80211_dragonfly.c
+++ b/sys/netproto/802_11/wlan/ieee80211_dragonfly.c
@@ -812,7 +812,6 @@ ieee80211_handoff(struct ifnet *dst_ifp, struct mbuf *m)

/* We may be sending a fragment so traverse the mbuf */
wlan_assert_serialized();
- wlan_serialize_exit();
for (; m; m = m0) {
struct altq_pktattr pktattr;

@@ -824,7 +823,6 @@ ieee80211_handoff(struct ifnet *dst_ifp, struct mbuf *m)

ifq_dispatch(dst_ifp, m, &pktattr);
}
- wlan_serialize_enter();

return (0);
}

I don't know whether this might create issues with other drivers and I'm not yet 100% certain
that it really fixes the problem in my case.

#3 Updated by Johannes.Hofmann over 1 year ago

After looking at if_iwn.c a more appropriate approach is probabely to leave ieee80211_dragonfly.c alone and change the locking in if_ath.c like in attached patch.

#4 Updated by sepherosa over 1 year ago

On Fri, Nov 23, 2012 at 5:27 AM, Johannes Hofmann via Redmine
<> wrote:
>
> Issue #2458 has been updated by Johannes Hofmann.
>
> File ath_lock.diff added
>
> After looking at if_iwn.c a more appropriate approach is probabely to leave ieee80211_dragonfly.c alone and change the locking in if_ath.c like in attached patch.
> ----------------------------------------
> Bug #2458: ath: ran out of descriptors
> http://bugs.dragonflybsd.org/issues/2458
>
> Author: Johannes Hofmann
> Status: New
> Priority: Normal
> Assignee:
> Category:
> Target version:
>
>
> After running for a while in a WPA secured WLAN with an ath(4) card, I get:
> ath: ran out of descriptors
> and the network stops working.
> Sometimes destroying wlan0 and kldunload if_ath; kldload if_ath
> restores network, but in some cases I get

Well, I believe the problem is actually more serious: the ifnet's
if_serializer is not properly setup to wlan_global_serializer
(ieee80211_ifattach->if_attach, wlan_global_serializer should be
passed as the second parameter to if_attach instead of NULL). If you
are interested, that could be a good project. I will have to note
that you may experience many panics before the things work. However,
once it works, it will no longer cause further problems. If anyone
want to pick it up, please feel free to contact me, if you need the
ifnet serializer working mechanism.

As about the patch, if other folks w/ the proper hardware could
confirm it works, we could just commit it as workaround.

Best Regards,
sephe

>
>
>
> Nov 18 12:45:06 plop kernel: Trace beginning at frame 0xd6264ca8
> Nov 18 12:45:06 plop kernel: m_free(ffffffff,d47f1c00,0,d936dc00,d5a94400) at m_free+0xc6 0x
> c01f26ed
> Nov 18 12:45:06 plop kernel: m_free(d47f7800,d6264cfc,c09c8846,d47f7800,c47576f0) at m_free+
> 0xc6 0xc01f26ed
> Nov 18 12:45:06 plop kernel: m_freem(d47f7800,c47576f0,d6264cfc,c09e8e58,d5cf9f80) at m_free
> m+0x15 0xc01f29c6
> Nov 18 12:45:06 plop kernel: ieee80211_flush_ifq(d5a94400,d4c46500,ffffffff,d5e35478,c47576f
> 0) at ieee80211_flush_ifq+0x70 0xc09c8846
> Nov 18 12:45:06 plop kernel: ieee80211_newstate_task(d4c46500,1,c47576f4,1,c475770c) at ieee
> 80211_newstate_task+0x231 0xc09e74e3
> Nov 18 12:45:06 plop kernel: taskqueue_run(c47576f0,c475770c,0,c03d27b3,0) at taskqueue_run+
> 0x93 0xc01dd304
> Nov 18 12:45:06 plop kernel: taskqueue_thread_loop(d5e3544c,0,0,0,0) at taskqueue_thread_loo
> p+0x4b 0xc01dd5ed
> Nov 18 12:45:06 plop kernel: lwkt_exit() at lwkt_exit 0xc01bbab3
>
>
> and the system freezes on kldunload if_ath
>
>
> --
> You have received this notification because you have either subscribed to it, or are involved in it.
> To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account

--
Tomorrow Will Never Die

#5 Updated by Johannes.Hofmann over 1 year ago

Thanks for looking into this! I tried to implement it as you suggested
and simply doing:

diff --git a/sys/netproto/802_11/wlan/ieee80211.c b/sys/netproto/802_11/wlan/ieee80211.c
index c4084d4..d6be645 100644
--- a/sys/netproto/802_11/wlan/ieee80211.c
+++ b/sys/netproto/802_11/wlan/ieee80211.c
@@ -304,7 +304,7 @@ ieee80211_ifattach(struct ieee80211com *ic,

ifp->if_addrlen = IEEE80211_ADDR_LEN;
ifp->if_hdrlen = 0;
- if_attach(ifp, NULL);
+ if_attach(ifp, &wlan_global_serializer);
ifp->if_mtu = IEEE80211_MTU_MAX;
ifp->if_broadcastaddr = ieee80211broadcastaddr;
ifp->if_output = null_output;
diff --git a/sys/netproto/802_11/wlan/ieee80211_ioctl.c
b/sys/netproto/802_11/wlan/ieee80211_ioctl.c
index a31dad7..d7f2d92 100644
--- a/sys/netproto/802_11/wlan/ieee80211_ioctl.c
+++ b/sys/netproto/802_11/wlan/ieee80211_ioctl.c
@@ -3201,7 +3201,9 @@ ieee80211_ioctl_updatemulti(struct ieee80211com *ic)
TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
if (ifma->ifma_addr->sa_family != AF_LINK)
continue;
+ wlan_serialize_exit();
(void) if_addmulti(parent, ifma->ifma_addr, NULL);
+ wlan_serialize_enter();
}
}
parent->if_ioctl = ioctl;

seems to work for ath so far. Do you have a better idea how to deal
with the recursive call to if_addmulti()?
Of course the other wlan drivers would also need to be adjusted. I
can work on that if you confirm that this is the way to go.

Best Regards,
Johannes

#6 Updated by sepherosa over 1 year ago

On Thu, Nov 29, 2012 at 6:23 AM, Johannes Hofmann via Redmine
<> wrote:
>
> Issue #2458 has been updated by Johannes Hofmann.
>
>
> Thanks for looking into this! I tried to implement it as you suggested
> and simply doing:
>
> diff --git a/sys/netproto/802_11/wlan/ieee80211.c b/sys/netproto/802_11/wlan/ieee80211.c
> index c4084d4..d6be645 100644
> --- a/sys/netproto/802_11/wlan/ieee80211.c
> +++ b/sys/netproto/802_11/wlan/ieee80211.c
> @@ -304,7 +304,7 @@ ieee80211_ifattach(struct ieee80211com *ic,
>
> ifp->if_addrlen = IEEE80211_ADDR_LEN;
> ifp->if_hdrlen = 0;
> - if_attach(ifp, NULL);
> + if_attach(ifp, &wlan_global_serializer);
> ifp->if_mtu = IEEE80211_MTU_MAX;
> ifp->if_broadcastaddr = ieee80211broadcastaddr;
> ifp->if_output = null_output;
> diff --git a/sys/netproto/802_11/wlan/ieee80211_ioctl.c
> b/sys/netproto/802_11/wlan/ieee80211_ioctl.c
> index a31dad7..d7f2d92 100644
> --- a/sys/netproto/802_11/wlan/ieee80211_ioctl.c
> +++ b/sys/netproto/802_11/wlan/ieee80211_ioctl.c
> @@ -3201,7 +3201,9 @@ ieee80211_ioctl_updatemulti(struct ieee80211com *ic)
> TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
> if (ifma->ifma_addr->sa_family != AF_LINK)
> continue;
> + wlan_serialize_exit();
> (void) if_addmulti(parent, ifma->ifma_addr, NULL);
> + wlan_serialize_enter();
> }
> }
> parent->if_ioctl = ioctl;
>
>
> seems to work for ath so far. Do you have a better idea how to deal
> with the recursive call to if_addmulti()?

Hmm, not much, but I believe you should release the serializer at the
beginning of ieee80211_ioctl_updatemulti() (if_delallmulti will try
holding serializer too). Since multicast address alteration/accessing
are serialized by netisr0, releasing serializer here probably will not
hurt.

Best Regards,
sephe

> Of course the other wlan drivers would also need to be adjusted. I
> can work on that if you confirm that this is the way to go.
>
> Best Regards,
> Johannes
>
> ----------------------------------------
> Bug #2458: ath: ran out of descriptors
> http://bugs.dragonflybsd.org/issues/2458
>
> Author: Johannes Hofmann
> Status: New
> Priority: Normal
> Assignee:
> Category:
> Target version:
>
>
> After running for a while in a WPA secured WLAN with an ath(4) card, I get:
> ath: ran out of descriptors
> and the network stops working.
> Sometimes destroying wlan0 and kldunload if_ath; kldload if_ath
> restores network, but in some cases I get
>
>
>
> Nov 18 12:45:06 plop kernel: Trace beginning at frame 0xd6264ca8
> Nov 18 12:45:06 plop kernel: m_free(ffffffff,d47f1c00,0,d936dc00,d5a94400) at m_free+0xc6 0x
> c01f26ed
> Nov 18 12:45:06 plop kernel: m_free(d47f7800,d6264cfc,c09c8846,d47f7800,c47576f0) at m_free+
> 0xc6 0xc01f26ed
> Nov 18 12:45:06 plop kernel: m_freem(d47f7800,c47576f0,d6264cfc,c09e8e58,d5cf9f80) at m_free
> m+0x15 0xc01f29c6
> Nov 18 12:45:06 plop kernel: ieee80211_flush_ifq(d5a94400,d4c46500,ffffffff,d5e35478,c47576f
> 0) at ieee80211_flush_ifq+0x70 0xc09c8846
> Nov 18 12:45:06 plop kernel: ieee80211_newstate_task(d4c46500,1,c47576f4,1,c475770c) at ieee
> 80211_newstate_task+0x231 0xc09e74e3
> Nov 18 12:45:06 plop kernel: taskqueue_run(c47576f0,c475770c,0,c03d27b3,0) at taskqueue_run+
> 0x93 0xc01dd304
> Nov 18 12:45:06 plop kernel: taskqueue_thread_loop(d5e3544c,0,0,0,0) at taskqueue_thread_loo
> p+0x4b 0xc01dd5ed
> Nov 18 12:45:06 plop kernel: lwkt_exit() at lwkt_exit 0xc01bbab3
>
>
> and the system freezes on kldunload if_ath
>
>
> --
> You have received this notification because you have either subscribed to it, or are involved in it.
> To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account

--
Tomorrow Will Never Die

#7 Updated by Johannes.Hofmann over 1 year ago

I created a wlan_serialize branch in my leaf repo (~hofmann).
It's tested with ath an iwn so far.

#8 Updated by sepherosa over 1 year ago

On Mon, Dec 3, 2012 at 1:06 AM, Johannes Hofmann via Redmine
<> wrote:
>
> Issue #2458 has been updated by Johannes Hofmann.
>
>
> I created a wlan_serialize branch in my leaf repo (~hofmann).
> It's tested with ath an iwn so far.

Cool. Do you have time to bring it one step further? I would like to
put ASSERT_SERIALIZED(ifp->if_serializer) in both vap's and physical
interface's public methods (if_init, if_start, if_ioctl, etc)

Best Regards,
sephe

> ----------------------------------------
> Bug #2458: ath: ran out of descriptors
> http://bugs.dragonflybsd.org/issues/2458
>
> Author: Johannes Hofmann
> Status: New
> Priority: Normal
> Assignee:
> Category:
> Target version:
>
>
> After running for a while in a WPA secured WLAN with an ath(4) card, I get:
> ath: ran out of descriptors
> and the network stops working.
> Sometimes destroying wlan0 and kldunload if_ath; kldload if_ath
> restores network, but in some cases I get
>
>
>
> Nov 18 12:45:06 plop kernel: Trace beginning at frame 0xd6264ca8
> Nov 18 12:45:06 plop kernel: m_free(ffffffff,d47f1c00,0,d936dc00,d5a94400) at m_free+0xc6 0x
> c01f26ed
> Nov 18 12:45:06 plop kernel: m_free(d47f7800,d6264cfc,c09c8846,d47f7800,c47576f0) at m_free+
> 0xc6 0xc01f26ed
> Nov 18 12:45:06 plop kernel: m_freem(d47f7800,c47576f0,d6264cfc,c09e8e58,d5cf9f80) at m_free
> m+0x15 0xc01f29c6
> Nov 18 12:45:06 plop kernel: ieee80211_flush_ifq(d5a94400,d4c46500,ffffffff,d5e35478,c47576f
> 0) at ieee80211_flush_ifq+0x70 0xc09c8846
> Nov 18 12:45:06 plop kernel: ieee80211_newstate_task(d4c46500,1,c47576f4,1,c475770c) at ieee
> 80211_newstate_task+0x231 0xc09e74e3
> Nov 18 12:45:06 plop kernel: taskqueue_run(c47576f0,c475770c,0,c03d27b3,0) at taskqueue_run+
> 0x93 0xc01dd304
> Nov 18 12:45:06 plop kernel: taskqueue_thread_loop(d5e3544c,0,0,0,0) at taskqueue_thread_loo
> p+0x4b 0xc01dd5ed
> Nov 18 12:45:06 plop kernel: lwkt_exit() at lwkt_exit 0xc01bbab3
>
>
> and the system freezes on kldunload if_ath
>
>
> --
> You have received this notification because you have either subscribed to it, or are involved in it.
> To change your notification preferences, please click here: http://bugs.dragonflybsd.org/my/account

--
Tomorrow Will Never Die

#9 Updated by Johannes.Hofmann over 1 year ago

Sure. I've updated the wlan_serialize branch in my leaf repo (~hofmann).
I've added the checks in ath and iwn (the devices I can test).
So far no problems for me.
But on irc I got a negative report for ral.
It didn't work anymore with wlan_global_serializer as if_serializer.

Regards,
Johannes

PS: I also had wlan_assert_serialized() in the if_output functions, but it triggered.
I guess it's ok for those to be called without the if_serializer?

#10 Updated by Johannes.Hofmann about 1 year ago

After releasing the wlan serialzer when loading the firmware image ral also seems to work
(tested-by: luxh).
The wlan_serialize branch at leaf/~hofmann is updated accordingly.

#11 Updated by Johannes.Hofmann about 1 year ago

  • Status changed from New to Resolved

Also available in: Atom PDF