Bug #3187
closed
Kernel: multiple swap pager messages during hammer cleanup
Added by Anonymous over 5 years ago.
Updated almost 5 years ago.
Description
On master, commit a19800f223, running hammer cleanup causes dozens of messages like this and causes the machine to become inaccessible by network services, SSH in particular, although ping still works:
May 23 19:20:14 <kern.crit> boojum kernel: swap_pager: indefinite wait buffer: bp 0xfffff800735334d8 offset: 5092941824, size: 4096 m=0xfffff8005fee7580 busy=e0000000 flags=00004080
This behavior was also observed at commit d29a2431b3, so whatever is causing it is probably earlier than that. When I ran hammer cleanup today, the last thing I noticed running before my SSH connection died was hammer reblock-btree. This was on a 8TB encrypted volume.
Files
hammer cleanup is unusable until this gets fixed. I've disabled the script in periodic daily for now.
Adding some machine info:
Samsung 1TB SSD containing:
- 60GB encrypted swap (via fstab crypt option)
- encrypted root
Data drives:
- 10TB SATA encrypted primary
- 8TB SATA encrypted backup (hammer-mirrored PFSes from primary drive)
I am going to guess that it is an interaction with the encrypted swap that is causing the problem.
Try turning off encryption on your swap partition (in your /etc/fstab), reboot, and see if the problem reoccurs.
-Matt
This continues to fail on a regular basis and turning off encrypted swap didn't help. Unfortunately, the nature of this failure is such that it won't allow me to get core dump. The only fix is to reboot the machine which is needed several times a week.
I was finally able to get a core dump while this was happening. I've uploaded the files to crash/bug-3187 on leaf.
This issue is not limited to hammer1. I've replaced all but one of my disks with SSDs all my volumes are hammer2 now and I've seen this same thing happen 3 times since then. Twice it appeared to happen during the nightly cleanup, but once was during the day with only moderated disk activity. This is a lot less ferquently than when using hammer1, so it's going to be even more difficult to track down.
This is on the 5.6.2 release.
- Status changed from New to Resolved
I haven't seen an occurrence of this for awhile. I'm guessing that this commit fixed it?
git: kernel - Fix indefinite wait buffer bug with encrypted disks
Also available in: Atom
PDF