Bug #1886

ath/wpa_supplicant/dhclient lost connection

Added by elekktretterr almost 4 years ago. Updated 6 months ago.

Status:FeedbackStart date:
Priority:NormalDue date:
Assignee:tuxillo% Done:

0%

Category:Networking
Target version:3.8.0

Description

Hi,

This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.

Anyone experiencing similar a issue?

Petr

History

#1 Updated by matthias almost 4 years ago

* wrote:
> Hi,
>
> This is on i386/master from about a week ago. It's an SMP laptop. For some
> reason after about 6 hours of running, wireless drops out, but
> wpa_supplicant is still running, and dhclient is running at near 100% CPU.
> It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
> up and reacquire IP.
>
> Anyone experiencing similar a issue?

Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master. Killing the dhclient helps.

Cheers

Matthias

#2 Updated by sjg almost 4 years ago

On Tue, Oct 26, 2010 at 12:17 AM, Matthias Schmidt
<> wrote:
> * wrote:
>> Hi,
>>
>> This is on i386/master from about a week ago. It's an SMP laptop. For some
>> reason after about 6 hours of running, wireless drops out, but
>> wpa_supplicant is still running, and dhclient is running at near 100% CPU.
>> It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
>> up and reacquire IP.
>>
>> Anyone experiencing similar a issue?
>
> Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
> master.  Killing the dhclient helps.
>
> Cheers
>
>        Matthias
>

Recent master kernel -and- world?

If this is seen again, truss or ktrace output of dhclient would be helpful.

If you could install a debug dhclient now, in the event it happens
again you could induce a core (or at least gdb the process and get a
trace), that might prove helpful as well.

Why, specifically, does the network stop working, if you can identify
it? Default route? resolv.conf overwritten?

Thanks,
Sam

#3 Updated by matthias almost 4 years ago

Hi,

* Samuel J. Greear wrote:
>
> Recent master kernel -and- world?

I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
22:26:32 CEST 2010. No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).

> If this is seen again, truss or ktrace output of dhclient would be helpful.

I attached truss and you see thousands of poll messages:

poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)

This sequence keeps repeating forever.

More debugging tomorrow, I'm currently short of time.

Cheers

Matthias

#4 Updated by sjg almost 4 years ago

On Wed, Oct 27, 2010 at 10:31 AM, Matthias Schmidt
<> wrote:
> Hi,
>
> * Samuel J. Greear wrote:
>>
>> Recent master kernel -and- world?
>
> I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct  5
> 22:26:32 CEST 2010.  No time yet to update the box to latest master.
> Its an old UP machine, so building world/kernel takes ages (even with
> quick).
>
>> If this is seen again, truss or ktrace output of dhclient would be helpful.
>
> I attached truss and you see thousands of poll messages:
>
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
>
> This sequence keeps repeating forever.
>
> More debugging tomorrow, I'm currently short of time.
>
> Cheers
>
>        Matthias
>

I patched a known infinite loop and another possible infinite loop on Sept. 9.

http://gitweb.dragonflybsd.org/dragonfly.git/commit/a05d48e6c7b12ad7abca8e3a3264413ac6f54003
http://gitweb.dragonflybsd.org/dragonfly.git/commit/db2a08265dc67538d417f14ccdc9fdef6fb0d60b

So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.

It looks like it would be much more efficient, also, if we moved the
call to time() behind the error check. (There went my attempt at humor
for the day).

Sam

#5 Updated by sjg almost 4 years ago

On Wed, Oct 27, 2010 at 4:38 PM, Samuel J. Greear <> wrote:
> So if it's newer than that there's definitely still a problem. Note
> that there are two poll(2) loops in dhclient, make sure you're looking
> at the right one if you dig into it. Probably it is the one in
> dispatch.c returning -1 and EAGAIN, the question is why.
>

I obviously didn't look at the truss output very hard, rather it is
probably returning POLLERR on that socket and falling through the
loop. Will need to determine whether bpf is broken or unhappy in some
fashion or whether it is a real error condition we should be handling
in dhclient.

Sam

#6 Updated by jaydg over 2 years ago

  • Description updated (diff)

It still happens on current master, but only when working on a wireless device. Wired interfaces are never affected.
I tried to do a debug dhclient, but it seems to be permanently stuck in poll().

#7 Updated by tuxillo about 2 years ago

  • Status changed from New to In Progress
  • Assignee changed from 0 to tuxillo

Grab.

#8 Updated by tuxillo over 1 year ago

Joachim, all,

Is it possible to check whether this is still the case? dhclient was updated few months ago and, although I doubt any of the fixes was related, it would be great to check it out.

Cheers,
Antonio Huete

#9 Updated by Johannes.Hofmann 12 months ago

  • Description updated (diff)

I just had a dhclient process using 100% CPU on recent master (Fri Sep 13 17:26:22 2013).
It's on an iwn(4) wlan interface after some while of running normally.

Johannes

#10 Updated by Vitaly_loki 12 months ago

Hello!

I experience exactly the same issue with the iwn on HP6530b

2013/9/20

>
> Issue #1886 has been updated by Johannes.Hofmann.
>
> Description updated
>
> I just had a dhclient process using 100% CPU on recent master (Fri Sep 13
> 17:26:22 2013).
> It's on an iwn(4) wlan interface after some while of running normally.
>
> Johannes
>
> ----------------------------------------
> Bug #1886: ath/wpa_supplicant/dhclient lost connection
> http://bugs.dragonflybsd.org/issues/1886#change-11583
>
> * Author: elekktretterr
> * Status: In Progress
> * Priority: Normal
> * Assignee: tuxillo
> * Category:
> * Target version:
> ----------------------------------------
> Hi,
>
> This is on i386/master from about a week ago. It's an SMP laptop. For some
> reason after about 6 hours of running, wireless drops out, but
> wpa_supplicant is still running, and dhclient is running at near 100% CPU.
> It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
> up and reacquire IP.
>
> Anyone experiencing similar a issue?
>
> Petr
>
>
> --
> You have received this notification because you have either subscribed to
> it, or are involved in it.
> To change your notification preferences, please click here:
> http://bugs.dragonflybsd.org/my/account
>

--
Vitaly

#11 Updated by tuxillo 6 months ago

  • Category set to Networking
  • Status changed from In Progress to Feedback
  • Target version set to 3.8.0

Hi guys,

Can anybody please provide a coredump when this issue happens?
If not, Are there instructions on how to trigger it? Does it happen with other device drivers other than iwn(4)? The ticket was logged for ath(4) anyways.

Thanks,
Antonio Huete

Also available in: Atom PDF