Bug #1886
ath/wpa_supplicant/dhclient lost connection
| Status: | In Progress | Start date: | ||
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | - | |||
| Target version: | - |
Description
Hi,
This is on i386/master from about a week ago. It's an SMP laptop. For some
reason after about 6 hours of running, wireless drops out, but
wpa_supplicant is still running, and dhclient is running at near 100% CPU.
It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
up and reacquire IP.
Anyone experiencing similar a issue?
Petr
Related todos
History
Updated by matthias over 2 years ago
* elekktretterr@exemail.com.au wrote:
> Hi,
>
> This is on i386/master from about a week ago. It's an SMP laptop. For some
> reason after about 6 hours of running, wireless drops out, but
> wpa_supplicant is still running, and dhclient is running at near 100% CPU.
> It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
> up and reacquire IP.
>
> Anyone experiencing similar a issue?
Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
master. Killing the dhclient helps.
Cheers
Matthias
Updated by sjg over 2 years ago
On Tue, Oct 26, 2010 at 12:17 AM, Matthias Schmidt
<matthias@dragonflybsd.org> wrote:
> * elekktretterr@exemail.com.au wrote:
>> Hi,
>>
>> This is on i386/master from about a week ago. It's an SMP laptop. For some
>> reason after about 6 hours of running, wireless drops out, but
>> wpa_supplicant is still running, and dhclient is running at near 100% CPU.
>> It's fixed by kill -HUP wpa_supplicant and dhclient to bring the network
>> up and reacquire IP.
>>
>> Anyone experiencing similar a issue?
>
> Yes, I saw the same behavior on an IBM Thinkpad T42 UP running i386
> master. Killing the dhclient helps.
>
> Cheers
>
> Matthias
>
Recent master kernel -and- world?
If this is seen again, truss or ktrace output of dhclient would be helpful.
If you could install a debug dhclient now, in the event it happens
again you could induce a core (or at least gdb the process and get a
trace), that might prove helpful as well.
Why, specifically, does the network stop working, if you can identify
it? Default route? resolv.conf overwritten?
Thanks,
Sam
Updated by matthias over 2 years ago
Hi,
* Samuel J. Greear wrote:
>
> Recent master kernel -and- world?
I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
22:26:32 CEST 2010. No time yet to update the box to latest master.
Its an old UP machine, so building world/kernel takes ages (even with
quick).
> If this is seen again, truss or ktrace output of dhclient would be helpful.
I attached truss and you see thousands of poll messages:
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
This sequence keeps repeating forever.
More debugging tomorrow, I'm currently short of time.
Cheers
Matthias
Updated by sjg over 2 years ago
On Wed, Oct 27, 2010 at 10:31 AM, Matthias Schmidt
<matthias@dragonflybsd.org> wrote:
> Hi,
>
> * Samuel J. Greear wrote:
>>
>> Recent master kernel -and- world?
>
> I'm running DragonFly v2.7.3.1236.g0b684-DEVELOPMENT #1: Tue Oct 5
> 22:26:32 CEST 2010. No time yet to update the box to latest master.
> Its an old UP machine, so building world/kernel takes ages (even with
> quick).
>
>> If this is seen again, truss or ktrace output of dhclient would be helpful.
>
> I attached truss and you see thousands of poll messages:
>
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
> poll(0xbfbff7e4,0x2,0x474a0) = 1 (0x1)
> gettimeofday(0xbfbff7bc,0x0) = 0 (0x0)
>
> This sequence keeps repeating forever.
>
> More debugging tomorrow, I'm currently short of time.
>
> Cheers
>
> Matthias
>
I patched a known infinite loop and another possible infinite loop on Sept. 9.
http://gitweb.dragonflybsd.org/dragonfly.git/commit/a05d48e6c7b12ad7abca8e3a3264413ac6f54003
http://gitweb.dragonflybsd.org/dragonfly.git/commit/db2a08265dc67538d417f14ccdc9fdef6fb0d60b
So if it's newer than that there's definitely still a problem. Note
that there are two poll(2) loops in dhclient, make sure you're looking
at the right one if you dig into it. Probably it is the one in
dispatch.c returning -1 and EAGAIN, the question is why.
It looks like it would be much more efficient, also, if we moved the
call to time() behind the error check. (There went my attempt at humor
for the day).
Sam
Updated by sjg over 2 years ago
On Wed, Oct 27, 2010 at 4:38 PM, Samuel J. Greear <sjg@evilcode.net> wrote:
> So if it's newer than that there's definitely still a problem. Note
> that there are two poll(2) loops in dhclient, make sure you're looking
> at the right one if you dig into it. Probably it is the one in
> dispatch.c returning -1 and EAGAIN, the question is why.
>
I obviously didn't look at the truss output very hard, rather it is
probably returning POLLERR on that socket and falling through the
loop. Will need to determine whether bpf is broken or unhappy in some
fashion or whether it is a real error condition we should be handling
in dhclient.
Sam
Updated by jaydg over 1 year ago
- Description updated (diff)
It still happens on current master, but only when working on a wireless device. Wired interfaces are never affected.
I tried to do a debug dhclient, but it seems to be permanently stuck in poll().