Bug #2298
closedcomputer reboots for no apparent reason
0%
Description
This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.
Files
Updated by ahuete.devel almost 13 years ago
Hi Pierre,
If it reboots maybe there is something in dmesg, can you please upload
the output of 'dmesg -a' somewhere?
Thanks,
Antonio Huete
Issue #2298 has been reported by Pierre Abbat.
----------------------------------------
Bug #2298: computer reboots for no apparent reason
http://bugs.dragonflybsd.org/issues/2298Author: Pierre Abbat
Status: New
Priority: High
Assignee:
Category:
Target version:This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.
Updated by robgar almost 13 years ago
turn off cstates in bios, and let me know what happens.
Updated by phma almost 13 years ago
- File periodic.1yzLDY4pdN periodic.1yzLDY4pdN added
- File periodic.XTA5spE1qs periodic.XTA5spE1qs added
- File periodic.zyJHMxiIjA periodic.zyJHMxiIjA added
Here are three partially written periodic files. Each one shows that hammer got to the reblock phase of maintaning a filesystem, but not the recopy phase. None of them got to /crypt0, which is a pfs in /crypt/.
Also, typing "mount" to see what's mounted triggers either the kernel debugger or a reboot.
Updated by phma almost 13 years ago
- Status changed from New to In Progress
I got a dump! I switched to the text console a few minutes before the periodic job was scheduled to start. A few minutes later, I got the kernel debugger. I called dumpsys and rebooted. It said "LK_RELEASE: no lock held." I'll pass it to Matt as soon as I can get a hold of him.
Updated by vsrinivas almost 13 years ago
Could you post the backtrace from core.txt or from a kgdb session on the kernel dump? That'd be pretty useful to tracking down the errant LK_RELEASE.
Thanks!
Updated by vsrinivas almost 13 years ago
Okay; the backtrace here:
panic: lockmgr: LK_RELEASE: no lock held
cpuid = 1
Trace beginning at frame 0xd35e6a44
panic(ffffffff,1,c06d3d58,d35e6a78,cc99f560) at panic+0x19e 0xc03839b3
panic(c06d3d58,ccb86be0,d35e6af0,ccb86e04,d35e6a9c) at panic+0x19e 0xc03839b3
lockmgr(d9f18eb0,6,d35e6ab8,c03ec45f,d9f18dd8) at lockmgr+0x4d0 0xc037550a
vn_unlock(d9f18dd8,ccb86be0,d35e6b24,c03e6d80,d9f18dd8) at vn_unlock+0x1e 0xc03f81cd
vput(d9f18dd8,d9f18dd8,c03d7fd1,c304d598,c2e881d8) at vput+0x12 0xc03ec45f
nlookup(d35e6c70,c03e73b6,ce173f68,d35e6c78,d08a02e0) at nlookup+0x5fa 0xc03e6d80
kern_statfs(d35e6c70,d35e6b70,0,1,8c) at kern_statfs+0x29 0xc03f1bac
sys_statfs(d35e6cf0,d35e6d00,8,0,0) at sys_statfs+0x52 0xc03f1cf4
syscall2(d35e6d40) at syscall2+0x270 0xc065e28c
Xint0x80_syscall() at Xint0x80_syscall+0x36 0xc062d2d6
Debugger("panic")
;
We were finding an unlocked vnode of VT_PROCFS; it turned out to be the root vnode from linprocfs.
linprocfs_allocvp contained a vx_unlock() of the vnode it had just allocated in the new-vnode case; this should be removed. (linprocfs_subr.c:223); it should be returned a locked vnode here. If not, the root vnode was not locked when we were in the namecache.
linprocfs also should use a vhold_interlock() around its vget() loop, and perhaps shouldn't synchronize with the pfs_token. It is likely still racy; probably the same fixes done to procfs this release cycle should be applied to it?
Updated by phma almost 13 years ago
I had forgotten to leave linprocfs mounted while hammer runs. I have verified that typing "mount" when linprocfs is mounted no longer causes a problem. It is mounted now; I'll let you know what happens tonight.
I have, since the fix, seen the computer reboot for no apparent reason when I tried to log on to an X session as another user. Any idea why?
Updated by phma almost 13 years ago
The computer ran its periodic and stayed up all night.