Project

General

Profile

Bug #3061

HAMMER2. Hang up, reboot, and crash.

Added by yellowrabbit2010 about 1 month ago. Updated 27 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
09/19/2017
Due date:
% Done:

0%


Description

I started the build as
------
nice -n 9 make -j 8 buildworld
------

@USR-SRC is mounted at /usr/scr, @USR-OBJ is mounted at /usr/obj.
The compilation stalled, I pressed reset and after reboot the system dumped immediately.
After that I was able to boot, but the PFSes were gone.

Each attempt to mount any of them led to the following errors:
-----------
Sep 19 17:23:31 fly kernel: hammer2_mount
Sep 19 17:23:31 fly kernel: hammer2_mount: dev="/dev/serno/WD-WCC2EP295836.s1d" label="USR-OBJ" rdonly=0
Sep 19 17:23:31 fly kernel: hammer2: using volume header #3
Sep 19 17:23:31 fly kernel: alloc spmp 0xffffff8525bc0000 tid 000000000000c00b
Sep 19 17:23:31 fly kernel: chain 00000068a0f4480a.01 key=0000000000000000 meth=31 CHECK FAIL (flags=00144002, bref/data f84c9b21bc9ef301/1e55118462e960c9)
Sep 19 17:23:31 fly kernel: hammer2_mount: error Check Error reading super-root
Sep 19 17:23:31 fly kernel: hammer2_unmount hmp=0xffffff852613a000 mount_count=0
Sep 19 17:23:31 fly kernel: unmount hmp 0xffffff852613a000 remove spmp 0xffffff8525bc0000
Sep 19 17:23:31 fly kernel: unmount hmp 0xffffff852613a000 last ref to PMP=0xffffff8525bc0000
Sep 19 17:23:31 fly kernel: pfsfree: 0xffffff8525bc0000 lrucount=0
Sep 19 17:23:31 fly kernel: hammer2_unmount(A): devvp /dev/serno/WD-WCC2EP295836.s1d rbdirty 0 ronly=0
Sep 19 17:23:31 fly kernel: hammer2_unmount(B): devvp /dev/serno/WD-WCC2EP295836.s1d rbdirty 0
Sep 19 17:23:31 fly kernel: v-chain 0xffffff852613a4c0.255 0000000000000000/0 mir=000000000000c00b
Sep 19 17:23:31 fly kernel: [00002000] (?) refs=1
Sep 19 17:23:31 fly kernel: f-chain 0xffffff852613a640.254 0000000000000000/0 mir=000000000000c00a
Sep 19 17:23:31 fly kernel: [00000004] (?) refs=0
-----------

core.txt.0 (239 KB) yellowrabbit2010, 09/19/2017 12:39 AM

History

#1 Updated by dillon about 1 month ago

  • Assignee set to dillon

I think I may have fixed this one last night in master (commit id through to a964af6f47472). I also believe that the stall should be fixed too (19808ac9def). There was a bug in the flush code that could catch some indirect block management in the middle of moving elements into or out of an indirect block, causing a damaged topology to be committed to media. This bug self-corrected during a normal shutdown, halt, or reboot, but not if the machine crashes or undergoes a hard reset.

Unfortunately, once damaged, the topology pretty much can't be repaired and the filesystem needs to be newfs_hammer2'd.

So I would say, update, reinstall, and keep watch. If it happens again with a kernel with a commitid of a964af6f47472 or later then we need to look at it more closely.

-Matt

#2 Updated by yellowrabbit2010 about 1 month ago

Updated, reinstalled, recreate all PFSes, so far all is good. Except that I once again ran into http://bugs.dragonflybsd.org/issues/3057 :) I'll watch another couple of days.

#3 Updated by yellowrabbit2010 about 1 month ago

During this time, the system experienced occasional reboots quite well. The last question here is: if I see such messages at boot, does this mean that I need to recreate the HAMMER2 partition?
---
Sep 23 20:19:17 fly kernel: reconnect to cluster: nc=1 focus=0
Sep 23 20:19:17 fly kernel: not a local device mount
---
They are not harmless --- the mount -a returns an error code and the /etc/rc.d/mountcritlocal script does not work properly.
Perhaps I incorrectly specify them in a /etc/fstab?
---
/dev/serno/WD-WCC2EP295836.s1d@LOCAL /mnt/aux-hdd hammer2 noatime,rw 1 1
@DOWNLOADS /mnt/dl hammer2 noatime,rw 0 0
@QEMU-IMGS /mnt/qemu-imgs hammer2 noatime,rw 0 0
---

#4 Updated by yellowrabbit2010 27 days ago

  • Status changed from New to Resolved

The above problems are not observed anymore.

Also available in: Atom PDF