Bug #2365
closedHammer pfs-destroy and prune-everything can cause network loss
0%
Description
Fairly consistently, when I run hammer prune-everything, at some point in the process my ssh session will stop and not recover and the machine becomes unavailable on the network. It eventually returns to normal and I can reconnect to it. Today, I ran a hammer pfs-destroy on a 1.3TB pfs and the same thing happened. While the network was out, it was putting these messages in the log:
swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096
swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 6725648384, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096
swap_pager: indefinite wait buffer: offset: 861982720, size: 4096
swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 6725648384, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 1027108864, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096
Files
Updated by swildner over 12 years ago
Is this on i386 or x86_64? I've seen the issue too here, although not triggered by prune-everything or pfs-destroy. It's just a thing that happens from time to time on my i386 box.
http://87.78.98.243/tmp/IMG_20120424_220035.jpg
In my case the trigger is not clear.
And I've never seen it on any x86_64 box.
Updated by Anonymous over 12 years ago
This is on x86_64 AMD. I should add that those swap_pager messages were happening during the pfs-destroy, but I don't know whether or not that was the trigger. More concerning to me is the loss of network connectivity.
Updated by swildner about 12 years ago
- File indefinite_wait_buffer.png indefinite_wait_buffer.png added
- File indefinite_wait_buffer2.png indefinite_wait_buffer2.png added
- File indefinite_wait_buffer3.png indefinite_wait_buffer3.png added
I just had the "indefinite wait buffer" in an i386 VM and I captured the beginnings of it. I was building in pkgsrc and then thought I should cleanup my HAMMER so I ran /etc/periodic/daily/160.clean-hammer and it went fine until it got to /home (second image). Although nothing of the package building took place on /home (afaics) it hung there and soon the "indefinite wait buffer" messages started to appear on the console (first image). First the offsets were all the same but later on it was several (third image).
Maybe this gives some better clue? So far I had never witnessed it when it happened.
Updated by Anonymous about 12 years ago
I upgraded my system (x86_64) to include all the recent scheduler changes and tried hammer prune-everything and for the first time in a long time, it didn't kill the network. I did noticed a slowdown, but that was all. Could those changes have fixed this problem? I'll do more testing.
Updated by Anonymous over 11 years ago
As recent as one month ago, this was still a problem. In fact, it happened very consistently with hammer prune-everything. Tonight, I ran it twice and network connections to the box didn't drop. Could recent commits have fixed this?
Updated by Anonymous over 11 years ago
- Status changed from New to Closed
After a good bit of testing, I'm ready to declare this resolved. Again, as of approximately one month ago this was still very much an issue, so some recent commit or combination of commits must have fixed it.