Bug #1886
closedath/wpa_supplicant/dhclient lost connection
Added by elekktretterr about 14 years ago. Updated over 5 years ago.
0%
Description
Hi,
This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.
Anyone experiencing similar a issue?
Petr
Updated by matthias about 14 years ago
- elekktretterr@exemail.com.au wrote:
Hi,
This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.Anyone experiencing similar a issue?
Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master. Killing the dhclient helps.
Cheers
Matthias
Updated by sjg about 14 years ago
On Tue, Oct 26, 2010 at 12:17 AM, Matthias Schmidt
<matthias@dragonflybsd.org> wrote:
- elekktretterr@exemail.com.au wrote:
Hi,
This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.Anyone experiencing similar a issue?
Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master. Killing the dhclient helps.Cheers
Matthias
Recent master kernel and world?
If this is seen again, truss or ktrace output of dhclient would be helpful.
If you could install a debug dhclient now, in the event it happens
again you could induce a core (or at least gdb the process and get a
trace), that might prove helpful as well.
Why, specifically, does the network stop working, if you can identify
it? Default route? resolv.conf overwritten?
Thanks,
Sam
Updated by matthias about 14 years ago
Hi,
- Samuel J. Greear wrote:
Recent master kernel
andworld?
I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
22:26:32 CEST 2010. No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).
If this is seen again, truss or ktrace output of dhclient would be helpful.
I attached truss and you see thousands of poll messages:
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
This sequence keeps repeating forever.
More debugging tomorrow, I'm currently short of time.
Cheers
Matthias
Updated by sjg about 14 years ago
On Wed, Oct 27, 2010 at 10:31 AM, Matthias Schmidt
<matthias@dragonflybsd.org> wrote:
Hi,
- Samuel J. Greear wrote:
Recent master kernel
andworld?I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
22:26:32 CEST 2010. No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).If this is seen again, truss or ktrace output of dhclient would be helpful.
I attached truss and you see thousands of poll messages:
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)This sequence keeps repeating forever.
More debugging tomorrow, I'm currently short of time.
Cheers
Matthias
I patched a known infinite loop and another possible infinite loop on Sept. 9.
http://gitweb.dragonflybsd.org/dragonfly.git/commit/a05d48e6c7b12ad7abca8e3a3264413ac6f54003
http://gitweb.dragonflybsd.org/dragonfly.git/commit/db2a08265dc67538d417f14ccdc9fdef6fb0d60b
So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.
It looks like it would be much more efficient, also, if we moved the
call to time() behind the error check. (There went my attempt at humor
for the day).
Sam
Updated by sjg about 14 years ago
On Wed, Oct 27, 2010 at 4:38 PM, Samuel J. Greear <sjg@evilcode.net> wrote:
So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.
I obviously didn't look at the truss output very hard, rather it is
probably returning POLLERR on that socket and falling through the
loop. Will need to determine whether bpf is broken or unhappy in some
fashion or whether it is a real error condition we should be handling
in dhclient.
Sam
Updated by jaydg almost 13 years ago
- Description updated (diff)
It still happens on current master, but only when working on a wireless device. Wired interfaces are never affected.
I tried to do a debug dhclient, but it seems to be permanently stuck in poll().
Updated by tuxillo over 12 years ago
- Status changed from New to In Progress
- Assignee changed from 0 to tuxillo
Grab.
Updated by tuxillo almost 12 years ago
Joachim, all,
Is it possible to check whether this is still the case? dhclient was updated few months ago and, although I doubt any of the fixes was related, it would be great to check it out.
Cheers,
Antonio Huete
Updated by Johannes.Hofmann about 11 years ago
- Description updated (diff)
I just had a dhclient process using 100% CPU on recent master (Fri Sep 13 17:26:22 2013).
It's on an iwn(4) wlan interface after some while of running normally.
Johannes
Updated by Vitaly_loki about 11 years ago
Hello!
I experience exactly the same issue with the iwn on HP6530b
2013/9/20
Issue #1886 has been updated by Johannes.Hofmann.
Description updated
I just had a dhclient process using 100% CPU on recent master (Fri Sep 13
17:26:22 2013).
It's on an iwn(4) wlan interface after some while of running normally.Johannes
----------------------------------------
Bug #1886: ath/wpa_supplicant/dhclient lost connection
http://bugs.dragonflybsd.org/issues/1886#change-11583
- Author: elekktretterr
- Status: In Progress
- Priority: Normal
- Assignee: tuxillo
- Category:
- Target version:
----------------------------------------
Hi,This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.Anyone experiencing similar a issue?
Petr
--
You have received this notification because you have either subscribed to
it, or are involved in it.
To change your notification preferences, please click here:
http://bugs.dragonflybsd.org/my/account
--
Vitaly
Updated by tuxillo over 10 years ago
- Category set to Networking
- Status changed from In Progress to Feedback
- Target version set to 3.8
Hi guys,
Can anybody please provide a coredump when this issue happens?
If not, Are there instructions on how to trigger it? Does it happen with other device drivers other than iwn(4)? The ticket was logged for ath(4) anyways.
Thanks,
Antonio Huete
Updated by tuxillo almost 10 years ago
- Assignee deleted (
tuxillo) - Target version changed from 3.8 to Unverifiable
Can't be verified until the caller provides some feedback.
Also i386 is no longer supported.
Updated by liweitianux over 5 years ago
- Status changed from Feedback to Resolved