Project

General

Profile

Actions

Bug #1886

closed

ath/wpa_supplicant/dhclient lost connection

Added by elekktretterr over 13 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Networking
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Hi,

This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.

Anyone experiencing similar a issue?

Petr

Actions #1

Updated by matthias over 13 years ago

Hi,

This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.

Anyone experiencing similar a issue?

Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master. Killing the dhclient helps.

Cheers

Matthias
Actions #2

Updated by sjg over 13 years ago

On Tue, Oct 26, 2010 at 12:17 AM, Matthias Schmidt
<> wrote:

Hi,

This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.

Anyone experiencing similar a issue?

Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master.  Killing the dhclient helps.

Cheers

       Matthias

Recent master kernel and world?

If this is seen again, truss or ktrace output of dhclient would be helpful.

If you could install a debug dhclient now, in the event it happens
again you could induce a core (or at least gdb the process and get a
trace), that might prove helpful as well.

Why, specifically, does the network stop working, if you can identify
it? Default route? resolv.conf overwritten?

Thanks,
Sam

Actions #3

Updated by matthias over 13 years ago

Hi,

  • Samuel J. Greear wrote:

Recent master kernel and world?

I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
22:26:32 CEST 2010. No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).

If this is seen again, truss or ktrace output of dhclient would be helpful.

I attached truss and you see thousands of poll messages:

poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)

This sequence keeps repeating forever.

More debugging tomorrow, I'm currently short of time.

Cheers

Matthias
Actions #4

Updated by sjg over 13 years ago

On Wed, Oct 27, 2010 at 10:31 AM, Matthias Schmidt
<> wrote:

Hi,

  • Samuel J. Greear wrote:

Recent master kernel and world?

I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct  5
22:26:32 CEST 2010.  No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).

If this is seen again, truss or ktrace output of dhclient would be helpful.

I attached truss and you see thousands of poll messages:

poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0)                     = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0)                     = 0 (0x0)

This sequence keeps repeating forever.

More debugging tomorrow, I'm currently short of time.

Cheers

       Matthias

I patched a known infinite loop and another possible infinite loop on Sept. 9.

http://gitweb.dragonflybsd.org/dragonfly.git/commit/a05d48e6c7b12ad7abca8e3a3264413ac6f54003
http://gitweb.dragonflybsd.org/dragonfly.git/commit/db2a08265dc67538d417f14ccdc9fdef6fb0d60b

So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.

It looks like it would be much more efficient, also, if we moved the
call to time() behind the error check. (There went my attempt at humor
for the day).

Sam

Actions #5

Updated by sjg over 13 years ago

On Wed, Oct 27, 2010 at 4:38 PM, Samuel J. Greear <> wrote:

So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.

I obviously didn't look at the truss output very hard, rather it is
probably returning POLLERR on that socket and falling through the
loop. Will need to determine whether bpf is broken or unhappy in some
fashion or whether it is a real error condition we should be handling
in dhclient.

Sam

Actions #6

Updated by jaydg about 12 years ago

  • Description updated (diff)

It still happens on current master, but only when working on a wireless device. Wired interfaces are never affected.
I tried to do a debug dhclient, but it seems to be permanently stuck in poll().

Actions #7

Updated by tuxillo over 11 years ago

  • Status changed from New to In Progress
  • Assignee changed from 0 to tuxillo

Grab.

Actions #8

Updated by tuxillo about 11 years ago

Joachim, all,

Is it possible to check whether this is still the case? dhclient was updated few months ago and, although I doubt any of the fixes was related, it would be great to check it out.

Cheers,
Antonio Huete

Actions #9

Updated by Johannes.Hofmann over 10 years ago

  • Description updated (diff)

I just had a dhclient process using 100% CPU on recent master (Fri Sep 13 17:26:22 2013).
It's on an iwn(4) wlan interface after some while of running normally.

Johannes
Actions #10

Updated by Vitaly_loki over 10 years ago

Hello!

I experience exactly the same issue with the iwn on HP6530b

2013/9/20

Issue #1886 has been updated by Johannes.Hofmann.

Description updated

I just had a dhclient process using 100% CPU on recent master (Fri Sep 13
17:26:22 2013).
It's on an iwn(4) wlan interface after some while of running normally.

Johannes

----------------------------------------
Bug #1886: ath/wpa_supplicant/dhclient lost connection
http://bugs.dragonflybsd.org/issues/1886#change-11583

  • Author: elekktretterr
  • Status: In Progress
  • Priority: Normal
  • Assignee: tuxillo
  • Category:
  • Target version:
    ----------------------------------------
    Hi,

This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.

Anyone experiencing similar a issue?

Petr

--
You have received this notification because you have either subscribed to
it, or are involved in it.
To change your notification preferences, please click here:
http://bugs.dragonflybsd.org/my/account

--
Vitaly

Actions #11

Updated by tuxillo about 10 years ago

  • Category set to Networking
  • Status changed from In Progress to Feedback
  • Target version set to 3.8

Hi guys,

Can anybody please provide a coredump when this issue happens?
If not, Are there instructions on how to trigger it? Does it happen with other device drivers other than iwn(4)? The ticket was logged for ath(4) anyways.

Thanks,
Antonio Huete

Actions #12

Updated by tuxillo about 9 years ago

  • Assignee deleted (tuxillo)
  • Target version changed from 3.8 to Unverifiable

Can't be verified until the caller provides some feedback.
Also i386 is no longer supported.

Actions #13

Updated by liweitianux almost 5 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF