Project

General

Profile

Actions

Bug #3056

closed

HAMMER2 corruption - nonexistent files

Added by arcade@b1t.name about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Kernel
Target version:
-
Start date:
09/16/2017
Due date:
% Done:

0%

Estimated time:

Description

After one crash (like a week ago) I got one dir damaged:

find . | wc -l

144

find . -type f |& wc -l

142

find .


./standard/module.time.9.cfg
./standard/module.notification.3.cfg
./standard/e_bindings.6.cfg
./standard/module.gadman.3.cfg
./standard/systray.5.cfg
./standard/module.tasks.5.cfg
./standard/e_bryces.5.cfg

find . -type f


find: ./standard/module.conf.9.cfg: No such file or directory
find: ./standard/module.music_control.7.cfg: No such file or directory
find: ./standard/module.everything-apps.3.cfg: No such file or directory
find: ./standard/module.ibar.7.cfg: No such file or directory
find: ./standard/module.syscon.6.cfg: No such file or directory

No messages on system console even with vfs.hammer2.debug=1. Not fixed with cleanup/blukfree. Current kernel: 2e8a93d8402e.


Files

core.txt.31 (210 KB) core.txt.31 arcade@b1t.name, 09/19/2017 07:41 AM
Actions #1

Updated by dillon about 7 years ago

  • Status changed from New to In Progress
  • Assignee set to dillon

I believe I fixed that one in the latest master with the sideq commit 7220df841c67. It won't fix the existing disconnect between inode nlinks and directory entries but it should prevent new occurrences. We have not yet confirmed that the fix actually prevents such things from happening, yet (testing now).

I will come up with a way to scrap the dead directory entries in an upcoming commit. It might be possible to just unlink("path") already, but 'rm' won't do it because 'rm' calls access() and lstat() first (which need the inode to exist).

-Matt

Actions #2

Updated by dillon about 7 years ago

I have added a new directive to the hammer2 utility which can remove the dead directory entries. 'hammer2 destroy <path>'. Use very carefully. Manual page has been updated. Needs a new kernel (w/updated hammer2 VFS), and a new hammer2 utility to use. Still considered alpha.

-Matt

Actions #3

Updated by arcade@b1t.name about 7 years ago

Sorry, can't find the file:

===> sbin/hammer2
make4: make4: don't know how to make cmd_destroy.c. Stop

make4: stopped in /usr/src/sbin/hammer2
  • [_SUBDIR_depend_hammer2] Error code 2
Actions #4

Updated by arcade@b1t.name about 7 years ago

Today I tested hammer2 destroy and it worked fine, yet 5 minutes later host crashed with (attached coredump). Host was actually crashing again with attempts to write that crash to /var/crash (with the same traceback). Probably unrelated.

Actions #5

Updated by arcade@b1t.name about 7 years ago

  • Status changed from Resolved to Feedback

Oops, it's not over yet.

I hadn't paid attention to other fs parts and some wrong files were still present. I cleaned up whole disk from incorrect files (god bless backups) and found some more weird things:

rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid

If I do `hammer2 destroy filename` one of them is destroyed.

I also noticed that after the crash simple empty files may appear in the place of newly created files.

Will clean up the drive and check how it would work.

Actions #6

Updated by dillon about 7 years ago

I hit the cluster_write() crash too, and it should now be fixed in master. It turned out to be a bug in devfs that only hammer2 tickles due to the way hammer2 uses cluster_write().

Also fixed a number of other issues this morning (see the git logs), including a corruption bug that can triggered by a crash or panic.

-Matt

Actions #7

Updated by arcade@b1t.name about 7 years ago

  • Status changed from Feedback to Resolved

Yeah, that's much better. Now I can compile new kernel without hitting a crash and host is stable again for a couple of hours.

Actions

Also available in: Atom PDF