Bug #987

an0: device timeout on x31

Added by jspringe about 6 years ago. Updated over 5 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi there,
I seem do be in bad luck with network devices :>!
While my remote machine is offline right now I wanted to check a few things locally
on my machine, because my PC is busy simulating things for a study project I thought it
might be nice to test dragonfly stuff on a laptop while waiting for results.
Everything works as expected, I can also connect to a wireless network poperly but then the wireless device gives permanent:
an0 device timeout
messages
The Laptop is an IBM X31 and the wlan card is working properly in Linux (slackware).
sysctl -a | grep hw.an gives:
hw.an.an_cache_iponly: 1
hw.an.an_cache_mcastonly: 0

History

#1 Updated by jspringe about 6 years ago

message broken...

hw.an.an_cache_mode: dbm
hw.an.an_dump: dbm

dmesg lists nothing special exept for the timeouts...

ping google.de

gives 3 times normal response with 40 ms
then a timeout and ~4000ms responses followed by various timeouts.

If someone could give a brief sketch of how to debug the module I also would not
fear doing it myself :).

Regards,
Tobias

#2 Updated by sepherosa about 6 years ago

On Wed, Apr 9, 2008 at 3:15 AM, Jost Tobias Springenberg
<> wrote:
> message broken...
>
> hw.an.an_cache_mode: dbm
> hw.an.an_dump: dbm
>
> dmesg lists nothing special exept for the timeouts...
>
> ping google.de
>
> gives 3 times normal response with 40 ms
> then a timeout and ~4000ms responses followed by various timeouts.
>
> If someone could give a brief sketch of how to debug the module I also would not
> fear doing it myself :).

Device timeout probably means TX desc is not correctly setup, but it
may be caused by misconfiguring some TX related registers. You could
take a look at an_start(), an_intr() and an_txeof(). BTW, NetBSD
seems to have done quite a good job on an(4).

Best Regards,
sephe

#3 Updated by jspringe almost 6 years ago

On Wed, 9 Apr 2008 09:48:56 +0800
"Sepherosa Ziehau" <> wrote:

I think I have found a hotfix for it, you were quite close talking about the TX registers.
I did not yet have the time to really test it though,
but I am willing to do so next weekend.
Is there anybody else using this driver, so that once I have it working we can check wether
the fix brakes anything for somebody else ?

Regards,
Tobias

#4 Updated by jspringe almost 6 years ago

Hi there,
ok I found a patch that was kind of ok somewhere on a freebsd mailing list.
The patch works and I can ping without any problem, initializing a ssh session also works.
However at some time after switching wlan on and transferring a file i.e. the system simply freezes.
No dropping to kernel debug console or the like.
Any ideas on that one ?

Greetings,
Tobias

--- if_anreg.h 2005-07-28 18:52:44.000000000 +0200
+++ if_anreg.h 2008-05-17 22:30:39.042056810 +0200
@@ -395,13 +395,16 @@ struct an_txframe_802_3 {
#define AN_PAYLOADTYPE_ETHER 0x0000
#define AN_PAYLOADTYPE_LLC 0x0010

-#define AN_TXCTL_80211 \
- (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_HEADERTYPE_80211| \
- AN_PAYLOADTYPE_LLC|AN_TXCTL_NORELEASE)
-
-#define AN_TXCTL_8023 \
- (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_HEADERTYPE_8023| \
- AN_PAYLOADTYPE_ETHER|AN_TXCTL_NORELEASE)
+#define AN_TXCTL_80211 (AN_HEADERTYPE_80211|AN_PAYLOADTYPE_LLC)
+
+#define AN_TXCTL_8023 (AN_HEADERTYPE_8023|AN_PAYLOADTYPE_ETHER)
+
+/*
+ * Additions to transmit control bits for MPI350
+ */
+
+#define AN_TXCTL_HW(x) ( x ? (AN_TXCTL_NORELEASE) : \
+ (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_TXCTL_NORELEASE))

#define AN_TXGAP_80211 0
#define AN_TXGAP_8023 0

--- if_an.c 2007-04-07 13:01:06.000000000 +0200
+++ if_an.c 2008-05-17 22:29:04.536671228 +0200
@@ -2529,7 +2529,7 @@ an_start(struct ifnet *ifp)
tx_frame_802_3.an_tx_802_3_payload_len,
(caddr_t)&sc->an_txbuf);

- txcontrol = AN_TXCTL_8023;
+ txcontrol = AN_TXCTL_8023 | AN_TXCTL_HW(sc->mpi350);
/* write the txcontrol only */
an_write_data(sc, id, 0x08, (caddr_t)&txcontrol,
sizeof(txcontrol));
@@ -2588,7 +2588,7 @@ an_start(struct ifnet *ifp)
tx_frame_802_3.an_tx_802_3_payload_len,
(caddr_t)&sc->an_txbuf);

- txcontrol = AN_TXCTL_8023;
+ txcontrol = AN_TXCTL_8023 | AN_TXCTL_HW(sc->mpi350);
/* write the txcontrol only */
bcopy((caddr_t)&txcontrol, &buf[0x08],
sizeof(txcontrol));

#5 Updated by jspringe almost 6 years ago

Okay,
unfortunately the system still freezes if i have traffic on the an0.
Tried to incorporate the patch into 12.1 release just to make sure it has nothing to do with
latest changes but still the same problem remains.
Could anyone give me a brief idea on how to track that freeze down ?
Greetings,
Tobias

#6 Updated by sepherosa almost 6 years ago

On Sun, May 18, 2008 at 5:02 PM, Jost Tobias Springenberg
<> wrote:
> Okay,
> unfortunately the system still freezes if i have traffic on the an0.

Do you mean even low rate traffic like ping?

Things like:
- write to IO registers when device is not initialized yet or when
device is powered off
- write to IO registers that do not exist
- Initialize RX/TX ring related IO registers, before RX/TX ring is initialized.
- RX/TX ring is not correctly initialized
- buffer is (wrongly) freed/trashed when device is doing DMA

Best Regards,
sephe

#7 Updated by jspringe almost 6 years ago

Nope that works fine only if I try to transfer i.e. a tarball.

What really strikes me is the following:
I tried 12.1 release and HEAD without any changes, just as they are in the repos.

On 12.1 livecd I get the errors that I wrote about in my first post,
but no freezes at all, even if transferring files (although performance is horrible due to the massive timeouts).

While using HEAD the system shows the same behavior but freezes after a few pings ...

That suggests that the problem is not due to the changes I made but a more general bug !?
What has changed in the network related areas since 12.1 ?
May it be related to Sephes changes ?

#8 Updated by sepherosa almost 6 years ago

Its a bug in the driver about how IFF_OACTIVE should be handled. I
will work out a fix.

Best Regards,
sephe

#9 Updated by sepherosa almost 6 years ago

Can you test following patch on HEAD:
http://leaf.dragonflybsd.org/~sephe/if_an_oactive.diff

Best Regards,
sephe

#10 Updated by jspringe almost 6 years ago

On Sun, 18 May 2008 18:32:53 +0800
"Sepherosa Ziehau" <> wrote:

Ok,
I have to run a whole buildworld buildkernel cycle because I tried the 12.2 release
I'll do that and report back in 3-4 hours.
Greetings,
Tobias

#11 Updated by sepherosa almost 6 years ago

BTW, is your card MPI350? Its TX desc setup looks suspecious:
~line 2615 if_an.c
for (i = 0; i < sizeof(an_tx_desc) / 4 ; i++) {
CSR_MEM_AUX_WRITE_4(sc, AN_TX_DESC_OFFSET
/* zero for now */
+ (0 * sizeof(an_tx_desc))
+ (i * 4),
((u_int32_t*)&an_tx_desc)[i]);
}

Physical address of the buffer is at the high end of the TX desc, but
valid bit is at the low end of the TX desc, which means we tell
hardware the TX desc is valid before we tell it the buffer's physical
address. This probably explains the watchdog timeout.

I think if your card is MPI350, you probably want to change the above code into:
for (i = sizeof(an_tx_desc) / 4 - 1; i >= 0; --i) {
CSR_MEM_AUX_WRITE_4(sc, AN_TX_DESC_OFFSET
/* zero for now */
+ (0 * sizeof(an_tx_desc))
+ (i * 4),
((u_int32_t*)&an_tx_desc)[i]);
}

Best Regards,
sephe

#12 Updated by jspringe almost 6 years ago

Yes my card is MPI350 i applied your patch and also tried the changes you suggested:
freezes are gone but the timeout problem remains ..
I guess the card does not like the INTR bits :).
After pushing my previous patch back in I finally seem to have a working device!

Should I post a diff with all the changes, so that you can commit them back to HEAD ?
As far as I can see the changes should not affect anything else but MPI350.

Thanks for the great fast responses, I guess I never would have sorted it out all by myself :=).
You can now call yourselves the only bsd that has a MPI350 firmware friendly an driver,
as it seems that neither FreeBSD nor NetBSD have a working solution right now (as far as i can tell reading their code), and OpenBSD did not even recognize my card.

Best Regards,
Tobias

#13 Updated by dillon almost 6 years ago

:Should I post a diff with all the changes, so that you can commit them back to HEAD ?
:As far as I can see the changes should not affect anything else but MPI350.
:...
:Best Regards,
:Tobias

Yes, please do. Having a known-working data point is always good :-)

-Matt
Matthew Dillon
<>

#14 Updated by jspringe almost 6 years ago

Okay here you go, this is what my system is running with right now.
Seems to work flawlessly!

--- /usr/src/sys/dev/netif/an/if_anreg.h 2005-07-28 18:52:44 +0200
+++ /usr/src/sys/dev/netif/an/if_anreg.h 2008-05-18 17:01:52 +0200
@@ -395,13 +395,16 @@ struct an_txframe_802_3 {
#define AN_PAYLOADTYPE_ETHER 0x0000
#define AN_PAYLOADTYPE_LLC 0x0010

-#define AN_TXCTL_80211 \
- (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_HEADERTYPE_80211| \
- AN_PAYLOADTYPE_LLC|AN_TXCTL_NORELEASE)
-
-#define AN_TXCTL_8023 \
- (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_HEADERTYPE_8023| \
- AN_PAYLOADTYPE_ETHER|AN_TXCTL_NORELEASE)
+#define AN_TXCTL_80211 (AN_HEADERTYPE_80211|AN_PAYLOADTYPE_LLC)
+
+#define AN_TXCTL_8023 (AN_HEADERTYPE_8023|AN_PAYLOADTYPE_ETHER)
+
+/*
+ * Additions to transmit control bits for MPI350
+ */
+
+#define AN_TXCTL_HW(x) ( x ? (AN_TXCTL_NORELEASE) : \
+ (AN_TXCTL_TXOK_INTR|AN_TXCTL_TXERR_INTR|AN_TXCTL_NORELEASE))

#define AN_TXGAP_80211 0
#define AN_TXGAP_8023 0

--- /usr/src/sys/dev/netif/an/if_an.c 2008-05-14 13:59:18 +0200
+++ /usr/src/sys/dev/netif/an/if_an.c 2008-05-18 17:02:52 +0200
@@ -2480,7 +2480,7 @@ an_start(struct ifnet *ifp)
struct mbuf *m0 = NULL;
struct an_txframe_802_3 tx_frame_802_3;
struct ether_header *eh;
- int id, idx, i;
+ int id, idx, i, ready;
unsigned char txcontrol;
struct an_card_tx_desc an_tx_desc;
u_int8_t *buf;
@@ -2501,12 +2501,14 @@ an_start(struct ifnet *ifp)
return;
}

+ ready = 0;
idx = sc->an_rdata.an_tx_prod;

if (!sc->mpi350) {
bzero((char *)&tx_frame_802_3, sizeof(tx_frame_802_3));

while (sc->an_rdata.an_tx_ring[idx] == 0) {
+ ready = 1;
m0 = ifq_dequeue(&ifp->if_snd, NULL);
if (m0 == NULL)
break;
@@ -2529,7 +2531,7 @@ an_start(struct ifnet *ifp)
tx_frame_802_3.an_tx_802_3_payload_len,
(caddr_t)&sc->an_txbuf);

- txcontrol = AN_TXCTL_8023;
+ txcontrol = AN_TXCTL_8023 | AN_TXCTL_HW(sc->mpi350);
/* write the txcontrol only */
an_write_data(sc, id, 0x08, (caddr_t)&txcontrol,
sizeof(txcontrol));
@@ -2563,11 +2565,12 @@ an_start(struct ifnet *ifp)
CSR_WRITE_2(sc, AN_INT_EN(sc->mpi350), 0);

while (sc->an_rdata.an_tx_empty ||
- idx != sc->an_rdata.an_tx_cons) {
+ idx != sc->an_rdata.an_tx_cons) {
+ ready = 1;
m0 = ifq_dequeue(&ifp->if_snd, NULL);
- if (m0 == NULL) {
+ if (m0 == NULL)
break;
- }
+
buf = sc->an_tx_buffer[idx].an_dma_vaddr;

eh = mtod(m0, struct ether_header *);
@@ -2588,7 +2591,7 @@ an_start(struct ifnet *ifp)
tx_frame_802_3.an_tx_802_3_payload_len,
(caddr_t)&sc->an_txbuf);

- txcontrol = AN_TXCTL_8023;
+ txcontrol = AN_TXCTL_8023 | AN_TXCTL_HW(sc->mpi350);
/* write the txcontrol only */
bcopy((caddr_t)&txcontrol, &buf[0x08],
sizeof(txcontrol));
@@ -2609,7 +2612,7 @@ an_start(struct ifnet *ifp)
an_tx_desc.an_len = 0x44 +
tx_frame_802_3.an_tx_802_3_payload_len;
an_tx_desc.an_phys = sc->an_tx_buffer[idx].an_dma_paddr;
- for (i = 0; i < sizeof(an_tx_desc) / 4 ; i++) {
+ for (i = sizeof(an_tx_desc) / 4 - 1; i >= 0 ; --i) {
CSR_MEM_AUX_WRITE_4(sc, AN_TX_DESC_OFFSET
/* zero for now */
+ (0 * sizeof(an_tx_desc))
@@ -2637,7 +2640,7 @@ an_start(struct ifnet *ifp)
CSR_WRITE_2(sc, AN_INT_EN(sc->mpi350), AN_INTRS(sc->mpi350));
}

- if (m0 != NULL)
+ if (!ready)
ifp->if_flags |= IFF_OACTIVE;

sc->an_rdata.an_tx_prod = idx;

#15 Updated by jspringe almost 6 years ago

Last note on this one:
Now using the laptop for 1 day 3 different networks tested, everything seems to work nicely!
It seems to be stable speed and latencies are nice.
Bug can be closed and the diffs against HEAD can be applied!
Thanks again,
Tobias

#16 Updated by sepherosa almost 6 years ago

On Mon, May 19, 2008 at 2:31 AM, Jost Tobias Springenberg
<> wrote:
>
>> Yes, please do. Having a known-working data point is always good :-)
>>
>> -Matt
>> Matthew Dillon
>> <>
>
> Okay here you go, this is what my system is running with right now.
> Seems to work flawlessly!

Committed! Thanks :)

Cheers,
sephe

#17 Updated by matthias over 5 years ago

Committed by sephe@

Also available in: Atom PDF