Project

General

Profile

Actions

Bug #2365

closed

Hammer pfs-destroy and prune-everything can cause network loss

Added by Anonymous over 12 years ago. Updated almost 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
05/09/2012
Due date:
% Done:

0%

Estimated time:

Description

Fairly consistently, when I run hammer prune-everything, at some point in the process my ssh session will stop and not recover and the machine becomes unavailable on the network. It eventually returns to normal and I can reconnect to it. Today, I ran a hammer pfs-destroy on a 1.3TB pfs and the same thing happened. While the network was out, it was putting these messages in the log:

swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096
swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 6725648384, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096
swap_pager: indefinite wait buffer: offset: 861982720, size: 4096
swap_pager: indefinite wait buffer: offset: 6489661440, size: 4096
swap_pager: indefinite wait buffer: offset: 6725648384, size: 4096
swap_pager: indefinite wait buffer: offset: 23187279872, size: 4096
swap_pager: indefinite wait buffer: offset: 6642294784, size: 4096
swap_pager: indefinite wait buffer: offset: 1027108864, size: 4096
swap_pager: indefinite wait buffer: offset: 6536355840, size: 4096
swap_pager: indefinite wait buffer: offset: 6500122624, size: 4096


Files

indefinite_wait_buffer.png (73.8 KB) indefinite_wait_buffer.png swildner, 08/25/2012 01:50 AM
indefinite_wait_buffer2.png (49.8 KB) indefinite_wait_buffer2.png swildner, 08/25/2012 01:50 AM
indefinite_wait_buffer3.png (65.6 KB) indefinite_wait_buffer3.png swildner, 08/25/2012 01:50 AM
Actions #1

Updated by swildner over 12 years ago

Is this on i386 or x86_64? I've seen the issue too here, although not triggered by prune-everything or pfs-destroy. It's just a thing that happens from time to time on my i386 box.

http://87.78.98.243/tmp/IMG_20120424_220035.jpg

In my case the trigger is not clear.

And I've never seen it on any x86_64 box.

Actions #2

Updated by Anonymous over 12 years ago

This is on x86_64 AMD. I should add that those swap_pager messages were happening during the pfs-destroy, but I don't know whether or not that was the trigger. More concerning to me is the loss of network connectivity.

Updated by swildner over 12 years ago

I just had the "indefinite wait buffer" in an i386 VM and I captured the beginnings of it. I was building in pkgsrc and then thought I should cleanup my HAMMER so I ran /etc/periodic/daily/160.clean-hammer and it went fine until it got to /home (second image). Although nothing of the package building took place on /home (afaics) it hung there and soon the "indefinite wait buffer" messages started to appear on the console (first image). First the offsets were all the same but later on it was several (third image).

Maybe this gives some better clue? So far I had never witnessed it when it happened.

Actions #4

Updated by Anonymous over 12 years ago

I upgraded my system (x86_64) to include all the recent scheduler changes and tried hammer prune-everything and for the first time in a long time, it didn't kill the network. I did noticed a slowdown, but that was all. Could those changes have fixed this problem? I'll do more testing.

Actions #5

Updated by Anonymous almost 12 years ago

As recent as one month ago, this was still a problem. In fact, it happened very consistently with hammer prune-everything. Tonight, I ran it twice and network connections to the box didn't drop. Could recent commits have fixed this?

Actions #6

Updated by Anonymous almost 12 years ago

  • Status changed from New to Closed

After a good bit of testing, I'm ready to declare this resolved. Again, as of approximately one month ago this was still very much an issue, so some recent commit or combination of commits must have fixed it.

Actions

Also available in: Atom PDF