Project

General

Profile

Actions

Bug #2298

closed

computer reboots for no apparent reason

Added by phma almost 13 years ago. Updated over 12 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
02/02/2012
Due date:
% Done:

0%

Estimated time:

Description

This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.


Files

dmesg.txt (14.2 KB) dmesg.txt phma, 02/03/2012 01:15 AM
periodic.1yzLDY4pdN (2.35 KB) periodic.1yzLDY4pdN phma, 02/04/2012 07:47 PM
periodic.XTA5spE1qs (2.12 KB) periodic.XTA5spE1qs phma, 02/04/2012 07:47 PM
periodic.zyJHMxiIjA (2.12 KB) periodic.zyJHMxiIjA phma, 02/04/2012 07:47 PM

Related issues 1 (0 open1 closed)

Blocks Bug #2286: 3.0 release catchall ticketClosedjustin01/22/2012

Actions
Actions #1

Updated by ahuete.devel almost 13 years ago

Hi Pierre,

If it reboots maybe there is something in dmesg, can you please upload
the output of 'dmesg -a' somewhere?

Thanks,
Antonio Huete

Issue #2298 has been reported by Pierre Abbat.

----------------------------------------
Bug #2298: computer reboots for no apparent reason
http://bugs.dragonflybsd.org/issues/2298

Author: Pierre Abbat
Status: New
Priority: High
Assignee:
Category:
Target version:

This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.

Actions #2

Updated by phma almost 13 years ago

Here's the output of dmesg -a. I doubt it has anything useful, as the computer rebooted when I was asleep, waited for the password, continued booting, and failed to start X, so I had to reboot it again.

Actions #3

Updated by robgar almost 13 years ago

turn off cstates in bios, and let me know what happens.

Updated by phma almost 13 years ago

Here are three partially written periodic files. Each one shows that hammer got to the reblock phase of maintaning a filesystem, but not the recopy phase. None of them got to /crypt0, which is a pfs in /crypt/.

Also, typing "mount" to see what's mounted triggers either the kernel debugger or a reboot.

Actions #5

Updated by phma almost 13 years ago

  • Status changed from New to In Progress

I got a dump! I switched to the text console a few minutes before the periodic job was scheduled to start. A few minutes later, I got the kernel debugger. I called dumpsys and rebooted. It said "LK_RELEASE: no lock held." I'll pass it to Matt as soon as I can get a hold of him.

Actions #6

Updated by vsrinivas almost 13 years ago

Could you post the backtrace from core.txt or from a kgdb session on the kernel dump? That'd be pretty useful to tracking down the errant LK_RELEASE.

Thanks!

Actions #7

Updated by vsrinivas almost 13 years ago

Okay; the backtrace here:

panic: lockmgr: LK_RELEASE: no lock held
cpuid = 1
Trace beginning at frame 0xd35e6a44
panic(ffffffff,1,c06d3d58,d35e6a78,cc99f560) at panic+0x19e 0xc03839b3
panic(c06d3d58,ccb86be0,d35e6af0,ccb86e04,d35e6a9c) at panic+0x19e 0xc03839b3
lockmgr(d9f18eb0,6,d35e6ab8,c03ec45f,d9f18dd8) at lockmgr+0x4d0 0xc037550a
vn_unlock(d9f18dd8,ccb86be0,d35e6b24,c03e6d80,d9f18dd8) at vn_unlock+0x1e 0xc03f81cd
vput(d9f18dd8,d9f18dd8,c03d7fd1,c304d598,c2e881d8) at vput+0x12 0xc03ec45f
nlookup(d35e6c70,c03e73b6,ce173f68,d35e6c78,d08a02e0) at nlookup+0x5fa 0xc03e6d80
kern_statfs(d35e6c70,d35e6b70,0,1,8c) at kern_statfs+0x29 0xc03f1bac
sys_statfs(d35e6cf0,d35e6d00,8,0,0) at sys_statfs+0x52 0xc03f1cf4
syscall2(d35e6d40) at syscall2+0x270 0xc065e28c
Xint0x80_syscall() at Xint0x80_syscall+0x36 0xc062d2d6
Debugger("panic")
;

We were finding an unlocked vnode of VT_PROCFS; it turned out to be the root vnode from linprocfs.

linprocfs_allocvp contained a vx_unlock() of the vnode it had just allocated in the new-vnode case; this should be removed. (linprocfs_subr.c:223); it should be returned a locked vnode here. If not, the root vnode was not locked when we were in the namecache.

linprocfs also should use a vhold_interlock() around its vget() loop, and perhaps shouldn't synchronize with the pfs_token. It is likely still racy; probably the same fixes done to procfs this release cycle should be applied to it?

Actions #8

Updated by vsrinivas almost 13 years ago

Any update on this?

Actions #9

Updated by phma almost 13 years ago

I had forgotten to leave linprocfs mounted while hammer runs. I have verified that typing "mount" when linprocfs is mounted no longer causes a problem. It is mounted now; I'll let you know what happens tonight.

I have, since the fix, seen the computer reboot for no apparent reason when I tried to log on to an X session as another user. Any idea why?

Actions #10

Updated by phma almost 13 years ago

The computer ran its periodic and stayed up all night.

Actions #11

Updated by herrgard over 12 years ago

Can this be closed?

Actions #12

Updated by phma over 12 years ago

  • Status changed from In Progress to Closed

Yes.

Actions

Also available in: Atom PDF