computer reboots for no apparent reason
This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v220.127.116.11.g7ca562-DEVELOPMENT.
#1 Updated by ahuete.devel over 6 years ago
If it reboots maybe there is something in dmesg, can you please upload
the output of 'dmesg -a' somewhere?
> Issue #2298 has been reported by Pierre Abbat.
> Bug #2298: computer reboots for no apparent reason
> Author: Pierre Abbat
> Status: New
> Priority: High
> Target version:
> This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v18.104.22.168.g7ca562-DEVELOPMENT.
#4 Updated by phma over 6 years ago
- File periodic.1yzLDY4pdN periodic.1yzLDY4pdN added
- File periodic.XTA5spE1qs periodic.XTA5spE1qs added
- File periodic.zyJHMxiIjA periodic.zyJHMxiIjA added
Here are three partially written periodic files. Each one shows that hammer got to the reblock phase of maintaning a filesystem, but not the recopy phase. None of them got to /crypt0, which is a pfs in /crypt/.
Also, typing "mount" to see what's mounted triggers either the kernel debugger or a reboot.
#5 Updated by phma over 6 years ago
- Status changed from New to In Progress
I got a dump! I switched to the text console a few minutes before the periodic job was scheduled to start. A few minutes later, I got the kernel debugger. I called dumpsys and rebooted. It said "LK_RELEASE: no lock held." I'll pass it to Matt as soon as I can get a hold of him.
#7 Updated by vsrinivas over 6 years ago
Okay; the backtrace here:
panic: lockmgr: LK_RELEASE: no lock held
cpuid = 1
Trace beginning at frame 0xd35e6a44
panic(ffffffff,1,c06d3d58,d35e6a78,cc99f560) at panic+0x19e 0xc03839b3
panic(c06d3d58,ccb86be0,d35e6af0,ccb86e04,d35e6a9c) at panic+0x19e 0xc03839b3
lockmgr(d9f18eb0,6,d35e6ab8,c03ec45f,d9f18dd8) at lockmgr+0x4d0 0xc037550a
vn_unlock(d9f18dd8,ccb86be0,d35e6b24,c03e6d80,d9f18dd8) at vn_unlock+0x1e 0xc03f81cd
vput(d9f18dd8,d9f18dd8,c03d7fd1,c304d598,c2e881d8) at vput+0x12 0xc03ec45f
nlookup(d35e6c70,c03e73b6,ce173f68,d35e6c78,d08a02e0) at nlookup+0x5fa 0xc03e6d80
kern_statfs(d35e6c70,d35e6b70,0,1,8c) at kern_statfs+0x29 0xc03f1bac
sys_statfs(d35e6cf0,d35e6d00,8,0,0) at sys_statfs+0x52 0xc03f1cf4
syscall2(d35e6d40) at syscall2+0x270 0xc065e28c
Xint0x80_syscall() at Xint0x80_syscall+0x36 0xc062d2d6
We were finding an unlocked vnode of VT_PROCFS; it turned out to be the root vnode from linprocfs.
linprocfs_allocvp contained a vx_unlock() of the vnode it had just allocated in the new-vnode case; this should be removed. (linprocfs_subr.c:223); it should be returned a locked vnode here. If not, the root vnode was not locked when we were in the namecache.
linprocfs also should use a vhold_interlock() around its vget() loop, and perhaps shouldn't synchronize with the pfs_token. It is likely still racy; probably the same fixes done to procfs this release cycle should be applied to it?
#9 Updated by phma over 6 years ago
I had forgotten to leave linprocfs mounted while hammer runs. I have verified that typing "mount" when linprocfs is mounted no longer causes a problem. It is mounted now; I'll let you know what happens tonight.
I have, since the fix, seen the computer reboot for no apparent reason when I tried to log on to an X session as another user. Any idea why?