Project

General

Profile

Bug #3056

HAMMER2 corruption - nonexistent files

Added by arcade@b1t.name 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Kernel
Target version:
-
Start date:
09/16/2017
Due date:
% Done:

0%


Description

After one crash (like a week ago) I got one dir damaged:

> find . | wc -l
144

> find . -type f |& wc -l
142

> find .

./standard/module.time.9.cfg
./standard/module.notification.3.cfg
./standard/e_bindings.6.cfg
./standard/module.gadman.3.cfg
./standard/systray.5.cfg
./standard/module.tasks.5.cfg
./standard/e_bryces.5.cfg

> find . -type f

find: ./standard/module.conf.9.cfg: No such file or directory
find: ./standard/module.music_control.7.cfg: No such file or directory
find: ./standard/module.everything-apps.3.cfg: No such file or directory
find: ./standard/module.ibar.7.cfg: No such file or directory
find: ./standard/module.syscon.6.cfg: No such file or directory

No messages on system console even with vfs.hammer2.debug=1. Not fixed with cleanup/blukfree. Current kernel: 2e8a93d8402e.

core.txt.31 (210 KB) arcade@b1t.name, 09/19/2017 07:41 AM

History

#1 Updated by dillon 3 months ago

  • Status changed from New to In Progress
  • Assignee set to dillon

I believe I fixed that one in the latest master with the sideq commit 7220df841c67. It won't fix the existing disconnect between inode nlinks and directory entries but it should prevent new occurrences. We have not yet confirmed that the fix actually prevents such things from happening, yet (testing now).

I will come up with a way to scrap the dead directory entries in an upcoming commit. It might be possible to just unlink("path") already, but 'rm' won't do it because 'rm' calls access() and lstat() first (which need the inode to exist).

-Matt

#2 Updated by dillon 3 months ago

I have added a new directive to the hammer2 utility which can remove the dead directory entries. 'hammer2 destroy <path>'. Use very carefully. Manual page has been updated. Needs a new kernel (w/updated hammer2 VFS), and a new hammer2 utility to use. Still considered alpha.

-Matt

#3 Updated by arcade@b1t.name 3 months ago

Sorry, can't find the file:

===> sbin/hammer2
make[4]: make[4]: don't know how to make cmd_destroy.c. Stop

make[4]: stopped in /usr/src/sbin/hammer2
*** [_SUBDIR_depend_hammer2] Error code 2

#4 Updated by arcade@b1t.name 3 months ago

  • File core.txt.31 added
  • Status changed from In Progress to Resolved

Today I tested hammer2 destroy and it worked fine, yet 5 minutes later host crashed with (attached coredump). Host was actually crashing again with attempts to write that crash to /var/crash (with the same traceback). Probably unrelated.

#5 Updated by arcade@b1t.name 3 months ago

  • Status changed from Resolved to Feedback

Oops, it's not over yet.

I hadn't paid attention to other fs parts and some wrong files were still present. I cleaned up whole disk from incorrect files (god bless backups) and found some more weird things:

-rw-r--r-- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
-rw-r--r-- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
-rw-r--r-- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
-rw-r--r-- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
-rw------- 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
-rw------- 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
-rw------- 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
-rw------- 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
-rw------- 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
-rw------- 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
-rw------- 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
-rw------- 1 root wheel 3 19-Sep-2017 19:22 syslog.pid

If I do `hammer2 destroy filename` one of them is destroyed.

I also noticed that after the crash simple empty files may appear in the place of newly created files.

Will clean up the drive and check how it would work.

#6 Updated by dillon 3 months ago

I hit the cluster_write() crash too, and it should now be fixed in master. It turned out to be a bug in devfs that only hammer2 tickles due to the way hammer2 uses cluster_write().

Also fixed a number of other issues this morning (see the git logs), including a corruption bug that can triggered by a crash or panic.

-Matt

#7 Updated by arcade@b1t.name 3 months ago

  • Status changed from Feedback to Resolved

Yeah, that's much better. Now I can compile new kernel without hitting a crash and host is stable again for a couple of hours.

Also available in: Atom PDF