Project

General

Profile

Actions

Bug #3187

closed

Kernel: multiple swap pager messages during hammer cleanup

Added by Anonymous over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
05/23/2019
Due date:
% Done:

0%

Estimated time:

Description

On master, commit a19800f223, running hammer cleanup causes dozens of messages like this and causes the machine to become inaccessible by network services, SSH in particular, although ping still works:

May 23 19:20:14 <kern.crit> boojum kernel: swap_pager: indefinite wait buffer: bp 0xfffff800735334d8 offset: 5092941824, size: 4096 m=0xfffff8005fee7580 busy=e0000000 flags=00004080

This behavior was also observed at commit d29a2431b3, so whatever is causing it is probably earlier than that. When I ran hammer cleanup today, the last thing I noticed running before my SSH connection died was hammer reblock-btree. This was on a 8TB encrypted volume.


Files

dmesg.boot (88 KB) dmesg.boot Anonymous, 08/13/2019 02:00 PM
fstab (976 Bytes) fstab Anonymous, 08/13/2019 02:03 PM
Actions #1

Updated by Anonymous over 5 years ago

hammer cleanup is unusable until this gets fixed. I've disabled the script in periodic daily for now.

Updated by Anonymous over 5 years ago

Adding some machine info:

Samsung 1TB SSD containing:
- 60GB encrypted swap (via fstab crypt option)
- encrypted root

Data drives:
- 10TB SATA encrypted primary
- 8TB SATA encrypted backup (hammer-mirrored PFSes from primary drive)

Actions #3

Updated by dillon over 5 years ago

  • Assignee set to dillon

I am going to guess that it is an interaction with the encrypted swap that is causing the problem.

Try turning off encryption on your swap partition (in your /etc/fstab), reboot, and see if the problem reoccurs.

-Matt

Actions #4

Updated by Anonymous about 5 years ago

This continues to fail on a regular basis and turning off encrypted swap didn't help. Unfortunately, the nature of this failure is such that it won't allow me to get core dump. The only fix is to reboot the machine which is needed several times a week.

Actions #5

Updated by Anonymous about 5 years ago

I was finally able to get a core dump while this was happening. I've uploaded the files to crash/bug-3187 on leaf.

Actions #6

Updated by Anonymous almost 5 years ago

This issue is not limited to hammer1. I've replaced all but one of my disks with SSDs all my volumes are hammer2 now and I've seen this same thing happen 3 times since then. Twice it appeared to happen during the nightly cleanup, but once was during the day with only moderated disk activity. This is a lot less ferquently than when using hammer1, so it's going to be even more difficult to track down.

Actions #7

Updated by Anonymous almost 5 years ago

This is on the 5.6.2 release.

Actions #8

Updated by Anonymous almost 5 years ago

  • Status changed from New to Resolved

I haven't seen an occurrence of this for awhile. I'm guessing that this commit fixed it?
git: kernel - Fix indefinite wait buffer bug with encrypted disks

Actions

Also available in: Atom PDF