Bug #3056
closed
HAMMER2 corruption - nonexistent files
Added by arcade@b1t.name about 7 years ago.
Updated about 7 years ago.
Description
After one crash (like a week ago) I got one dir damaged:
find . | wc -l
144
find . -type f |& wc -l
142
find .
…
./standard/module.time.9.cfg
./standard/module.notification.3.cfg
./standard/e_bindings.6.cfg
./standard/module.gadman.3.cfg
./standard/systray.5.cfg
./standard/module.tasks.5.cfg
./standard/e_bryces.5.cfg
…
find . -type f
…
find: ./standard/module.conf.9.cfg: No such file or directory
find: ./standard/module.music_control.7.cfg: No such file or directory
find: ./standard/module.everything-apps.3.cfg: No such file or directory
find: ./standard/module.ibar.7.cfg: No such file or directory
find: ./standard/module.syscon.6.cfg: No such file or directory
…
No messages on system console even with vfs.hammer2.debug=1. Not fixed with cleanup/blukfree. Current kernel: 2e8a93d8402e.
Files
- Status changed from New to In Progress
- Assignee set to dillon
I believe I fixed that one in the latest master with the sideq commit 7220df841c67. It won't fix the existing disconnect between inode nlinks and directory entries but it should prevent new occurrences. We have not yet confirmed that the fix actually prevents such things from happening, yet (testing now).
I will come up with a way to scrap the dead directory entries in an upcoming commit. It might be possible to just unlink("path") already, but 'rm' won't do it because 'rm' calls access() and lstat() first (which need the inode to exist).
-Matt
I have added a new directive to the hammer2 utility which can remove the dead directory entries. 'hammer2 destroy <path>'. Use very carefully. Manual page has been updated. Needs a new kernel (w/updated hammer2 VFS), and a new hammer2 utility to use. Still considered alpha.
-Matt
Sorry, can't find the file:
===> sbin/hammer2
make4: make4: don't know how to make cmd_destroy.c. Stop
make
4: stopped in /usr/src/sbin/hammer2
- [_SUBDIR_depend_hammer2] Error code 2
Today I tested hammer2 destroy and it worked fine, yet 5 minutes later host crashed with (attached coredump). Host was actually crashing again with attempts to write that crash to /var/crash (with the same traceback). Probably unrelated.
- Status changed from Resolved to Feedback
Oops, it's not over yet.
I hadn't paid attention to other fs parts and some wrong files were still present. I cleaned up whole disk from incorrect files (god bless backups) and found some more weird things:
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
rw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
If I do `hammer2 destroy filename` one of them is destroyed.
I also noticed that after the crash simple empty files may appear in the place of newly created files.
Will clean up the drive and check how it would work.
I hit the cluster_write() crash too, and it should now be fixed in master. It turned out to be a bug in devfs that only hammer2 tickles due to the way hammer2 uses cluster_write().
Also fixed a number of other issues this morning (see the git logs), including a corruption bug that can triggered by a crash or panic.
-Matt
- Status changed from Feedback to Resolved
Yeah, that's much better. Now I can compile new kernel without hitting a crash and host is stable again for a couple of hours.
Also available in: Atom
PDF