Bug #3061
closedHAMMER2. Hang up, reboot, and crash.
0%
Description
I started the build as
------
nice n 9 make -j 8 buildworld
-----
@USR-SRC is mounted at /usr/scr, @USR-OBJ is mounted at /usr/obj.
The compilation stalled, I pressed reset and after reboot the system dumped immediately.
After that I was able to boot, but the PFSes were gone.
Each attempt to mount any of them led to the following errors:
-----------
Sep 19 17:23:31 fly kernel: hammer2_mount
Sep 19 17:23:31 fly kernel: hammer2_mount: dev="/dev/serno/WD-WCC2EP295836.s1d" label="USR-OBJ" rdonly=0
Sep 19 17:23:31 fly kernel: hammer2: using volume header #3
Sep 19 17:23:31 fly kernel: alloc spmp 0xffffff8525bc0000 tid 000000000000c00b
Sep 19 17:23:31 fly kernel: chain 00000068a0f4480a.01 key=0000000000000000 meth=31 CHECK FAIL (flags=00144002, bref/data f84c9b21bc9ef301/1e55118462e960c9)
Sep 19 17:23:31 fly kernel: hammer2_mount: error Check Error reading super-root
Sep 19 17:23:31 fly kernel: hammer2_unmount hmp=0xffffff852613a000 mount_count=0
Sep 19 17:23:31 fly kernel: unmount hmp 0xffffff852613a000 remove spmp 0xffffff8525bc0000
Sep 19 17:23:31 fly kernel: unmount hmp 0xffffff852613a000 last ref to PMP=0xffffff8525bc0000
Sep 19 17:23:31 fly kernel: pfsfree: 0xffffff8525bc0000 lrucount=0
Sep 19 17:23:31 fly kernel: hammer2_unmount(A): devvp /dev/serno/WD-WCC2EP295836.s1d rbdirty 0 ronly=0
Sep 19 17:23:31 fly kernel: hammer2_unmount(B): devvp /dev/serno/WD-WCC2EP295836.s1d rbdirty 0
Sep 19 17:23:31 fly kernel: v-chain 0xffffff852613a4c0.255 0000000000000000/0 mir=000000000000c00b
Sep 19 17:23:31 fly kernel: [00002000] (?) refs=1
Sep 19 17:23:31 fly kernel: f-chain 0xffffff852613a640.254 0000000000000000/0 mir=000000000000c00a
Sep 19 17:23:31 fly kernel: [00000004] (?) refs=0
-----------
Files
Updated by dillon about 7 years ago
- Assignee set to dillon
I think I may have fixed this one last night in master (commit id through to a964af6f47472). I also believe that the stall should be fixed too (19808ac9def). There was a bug in the flush code that could catch some indirect block management in the middle of moving elements into or out of an indirect block, causing a damaged topology to be committed to media. This bug self-corrected during a normal shutdown, halt, or reboot, but not if the machine crashes or undergoes a hard reset.
Unfortunately, once damaged, the topology pretty much can't be repaired and the filesystem needs to be newfs_hammer2'd.
So I would say, update, reinstall, and keep watch. If it happens again with a kernel with a commitid of a964af6f47472 or later then we need to look at it more closely.
-Matt
Updated by yellowrabbit2010 about 7 years ago
Updated, reinstalled, recreate all PFSes, so far all is good. Except that I once again ran into http://bugs.dragonflybsd.org/issues/3057 :) I'll watch another couple of days.
Updated by yellowrabbit2010 about 7 years ago
During this time, the system experienced occasional reboots quite well. The last question here is: if I see such messages at boot, does this mean that I need to recreate the HAMMER2 partition?
---
Sep 23 20:19:17 fly kernel: reconnect to cluster: nc=1 focus=0
Sep 23 20:19:17 fly kernel: not a local device mount
---
They are not harmless --- the mount a returns an error code and the /etc/rc.d/mountcritlocal script does not work properly.
Perhaps I incorrectly specify them in a /etc/fstab?
--
/dev/serno/WD-WCC2EP295836.s1d@LOCAL /mnt/aux-hdd hammer2 noatime,rw 1 1
@DOWNLOADS /mnt/dl hammer2 noatime,rw 0 0
@QEMU-IMGS /mnt/qemu-imgs hammer2 noatime,rw 0 0
---
Updated by yellowrabbit2010 about 7 years ago
- Status changed from New to Resolved
The above problems are not observed anymore.