Bug #3056
closedHAMMER2 corruption - nonexistent files
0%
Description
After one crash (like a week ago) I got one dir damaged:
144find . | wc -l
142find . -type f |& wc -l
find .
…
./standard/module.time.9.cfg
./standard/module.notification.3.cfg
./standard/e_bindings.6.cfg
./standard/module.gadman.3.cfg
./standard/systray.5.cfg
./standard/module.tasks.5.cfg
./standard/e_bryces.5.cfg
…
find . -type f
…
find: ./standard/module.conf.9.cfg: No such file or directory
find: ./standard/module.music_control.7.cfg: No such file or directory
find: ./standard/module.everything-apps.3.cfg: No such file or directory
find: ./standard/module.ibar.7.cfg: No such file or directory
find: ./standard/module.syscon.6.cfg: No such file or directory
…
No messages on system console even with vfs.hammer2.debug=1. Not fixed with cleanup/blukfree. Current kernel: 2e8a93d8402e.
Files
Updated by dillon about 7 years ago
- Status changed from New to In Progress
- Assignee set to dillon
I believe I fixed that one in the latest master with the sideq commit 7220df841c67. It won't fix the existing disconnect between inode nlinks and directory entries but it should prevent new occurrences. We have not yet confirmed that the fix actually prevents such things from happening, yet (testing now).
I will come up with a way to scrap the dead directory entries in an upcoming commit. It might be possible to just unlink("path") already, but 'rm' won't do it because 'rm' calls access() and lstat() first (which need the inode to exist).
-Matt
Updated by dillon about 7 years ago
I have added a new directive to the hammer2 utility which can remove the dead directory entries. 'hammer2 destroy <path>'. Use very carefully. Manual page has been updated. Needs a new kernel (w/updated hammer2 VFS), and a new hammer2 utility to use. Still considered alpha.
-Matt
Updated by arcade@b1t.name about 7 years ago
Updated by arcade@b1t.name about 7 years ago
- File core.txt.31 core.txt.31 added
- Status changed from In Progress to Resolved
Today I tested hammer2 destroy and it worked fine, yet 5 minutes later host crashed with (attached coredump). Host was actually crashing again with attempts to write that crash to /var/crash (with the same traceback). Probably unrelated.
Updated by arcade@b1t.name about 7 years ago
- Status changed from Resolved to Feedback
Oops, it's not over yet.
I hadn't paid attention to other fs parts and some wrong files were still present. I cleaned up whole disk from incorrect files (god bless backups) and found some more weird things:
rw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pidrw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pidrw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pidrw-r--r- 1 root wheel 4 19-Sep-2017 19:23 slim.pidrw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pidrw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pidrw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pidrw------ 1 root wheel 4 19-Sep-2017 19:23 smartd.pidrw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pidrw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pidrw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pidrw------ 1 root wheel 3 19-Sep-2017 19:22 syslog.pid
If I do `hammer2 destroy filename` one of them is destroyed.
I also noticed that after the crash simple empty files may appear in the place of newly created files.
Will clean up the drive and check how it would work.
Updated by dillon about 7 years ago
I hit the cluster_write() crash too, and it should now be fixed in master. It turned out to be a bug in devfs that only hammer2 tickles due to the way hammer2 uses cluster_write().
Also fixed a number of other issues this morning (see the git logs), including a corruption bug that can triggered by a crash or panic.
-Matt
Updated by arcade@b1t.name about 7 years ago
- Status changed from Feedback to Resolved
Yeah, that's much better. Now I can compile new kernel without hitting a crash and host is stable again for a couple of hours.