Bug #2298

computer reboots for no apparent reason

Added by phma about 2 years ago. Updated about 2 years ago.

Status:ClosedStart date:02/02/2012
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.

dmesg.txt Magnifier (14.2 KB) phma, 02/03/2012 01:15 AM

periodic.1yzLDY4pdN (2.35 KB) phma, 02/04/2012 07:47 PM

periodic.XTA5spE1qs (2.12 KB) phma, 02/04/2012 07:47 PM

periodic.zyJHMxiIjA (2.12 KB) phma, 02/04/2012 07:47 PM


Related issues

Blocks Bug #2286: 3.0 release catchall ticket Closed 01/22/2012

History

#1 Updated by ahuete.devel about 2 years ago

Hi Pierre,

If it reboots maybe there is something in dmesg, can you please upload
the output of 'dmesg -a' somewhere?

Thanks,
Antonio Huete

> Issue #2298 has been reported by Pierre Abbat.
>
> ----------------------------------------
> Bug #2298: computer reboots for no apparent reason
> http://bugs.dragonflybsd.org/issues/2298
>
> Author: Pierre Abbat
> Status: New
> Priority: High
> Assignee:
> Category:
> Target version:
>
>
> This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.
>
>

#2 Updated by phma about 2 years ago

Here's the output of dmesg -a. I doubt it has anything useful, as the computer rebooted when I was asleep, waited for the password, continued booting, and failed to start X, so I had to reboot it again.

#3 Updated by robgar about 2 years ago

turn off cstates in bios, and let me know what happens.

#4 Updated by phma about 2 years ago

Here are three partially written periodic files. Each one shows that hammer got to the reblock phase of maintaning a filesystem, but not the recopy phase. None of them got to /crypt0, which is a pfs in /crypt/.

Also, typing "mount" to see what's mounted triggers either the kernel debugger or a reboot.

#5 Updated by phma about 2 years ago

  • Status changed from New to In Progress

I got a dump! I switched to the text console a few minutes before the periodic job was scheduled to start. A few minutes later, I got the kernel debugger. I called dumpsys and rebooted. It said "LK_RELEASE: no lock held." I'll pass it to Matt as soon as I can get a hold of him.

#6 Updated by vsrinivas about 2 years ago

Could you post the backtrace from core.txt or from a kgdb session on the kernel dump? That'd be pretty useful to tracking down the errant LK_RELEASE.

Thanks!

#7 Updated by vsrinivas about 2 years ago

Okay; the backtrace here:

panic: lockmgr: LK_RELEASE: no lock held
cpuid = 1
Trace beginning at frame 0xd35e6a44
panic(ffffffff,1,c06d3d58,d35e6a78,cc99f560) at panic+0x19e 0xc03839b3
panic(c06d3d58,ccb86be0,d35e6af0,ccb86e04,d35e6a9c) at panic+0x19e 0xc03839b3
lockmgr(d9f18eb0,6,d35e6ab8,c03ec45f,d9f18dd8) at lockmgr+0x4d0 0xc037550a
vn_unlock(d9f18dd8,ccb86be0,d35e6b24,c03e6d80,d9f18dd8) at vn_unlock+0x1e 0xc03f81cd
vput(d9f18dd8,d9f18dd8,c03d7fd1,c304d598,c2e881d8) at vput+0x12 0xc03ec45f
nlookup(d35e6c70,c03e73b6,ce173f68,d35e6c78,d08a02e0) at nlookup+0x5fa 0xc03e6d80
kern_statfs(d35e6c70,d35e6b70,0,1,8c) at kern_statfs+0x29 0xc03f1bac
sys_statfs(d35e6cf0,d35e6d00,8,0,0) at sys_statfs+0x52 0xc03f1cf4
syscall2(d35e6d40) at syscall2+0x270 0xc065e28c
Xint0x80_syscall() at Xint0x80_syscall+0x36 0xc062d2d6
Debugger("panic")
;

We were finding an unlocked vnode of VT_PROCFS; it turned out to be the root vnode from linprocfs.

linprocfs_allocvp contained a vx_unlock() of the vnode it had just allocated in the new-vnode case; this should be removed. (linprocfs_subr.c:223); it should be returned a locked vnode here. If not, the root vnode was not locked when we were in the namecache.

linprocfs also should use a vhold_interlock() around its vget() loop, and perhaps shouldn't synchronize with the pfs_token. It is likely still racy; probably the same fixes done to procfs this release cycle should be applied to it?

#8 Updated by vsrinivas about 2 years ago

Any update on this?

#9 Updated by phma about 2 years ago

I had forgotten to leave linprocfs mounted while hammer runs. I have verified that typing "mount" when linprocfs is mounted no longer causes a problem. It is mounted now; I'll let you know what happens tonight.

I have, since the fix, seen the computer reboot for no apparent reason when I tried to log on to an X session as another user. Any idea why?

#10 Updated by phma about 2 years ago

The computer ran its periodic and stayed up all night.

#11 Updated by herrgard about 2 years ago

Can this be closed?

#12 Updated by phma about 2 years ago

  • Status changed from In Progress to Closed

Yes.

Also available in: Atom PDF