Bug #2298: computer reboots for no apparent reason - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #2298

closed

computer reboots for no apparent reason

Added by phma over 13 years ago. Updated over 13 years ago.

Status:

Closed

Priority:

High

Assignee:

Category:

Target version:

Start date:

02/02/2012

Due date:

% Done:

Estimated time:

Description

This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.

Files

Download all files

dmesg.txt (14.2 KB) dmesg.txt		phma, 02/03/2012 01:15 AM
periodic.1yzLDY4pdN (2.35 KB) periodic.1yzLDY4pdN		phma, 02/04/2012 07:47 PM
periodic.XTA5spE1qs (2.12 KB) periodic.XTA5spE1qs		phma, 02/04/2012 07:47 PM
periodic.zyJHMxiIjA (2.12 KB) periodic.zyJHMxiIjA		phma, 02/04/2012 07:47 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by ahuete.devel over 13 years ago

Hi Pierre,

If it reboots maybe there is something in dmesg, can you please upload
the output of 'dmesg -a' somewhere?

Thanks,
Antonio Huete

Issue #2298 has been reported by Pierre Abbat.

----------------------------------------
Bug #2298: computer reboots for no apparent reason
http://bugs.dragonflybsd.org/issues/2298

Author: Pierre Abbat
Status: New
Priority: High
Assignee:
Category:
Target version:

This has happened twice apparently during the nightly hammer run, but also happens at other times. The computer reboots, without dumping a kernel core, so I have no way to tell what causes it. It happens at least once every two days and started recently. I'm running v3.1.0.93.g7ca562-DEVELOPMENT.

Actions

Copy link

Updated by phma over 13 years ago

File dmesg.txt dmesg.txt added

Here's the output of dmesg -a. I doubt it has anything useful, as the computer rebooted when I was asleep, waited for the password, continued booting, and failed to start X, so I had to reboot it again.

Actions

Copy link

Updated by robgar over 13 years ago

turn off cstates in bios, and let me know what happens.

Actions

Copy link Download all files

Updated by phma over 13 years ago

File periodic.1yzLDY4pdN periodic.1yzLDY4pdN added
File periodic.XTA5spE1qs periodic.XTA5spE1qs added
File periodic.zyJHMxiIjA periodic.zyJHMxiIjA added

Here are three partially written periodic files. Each one shows that hammer got to the reblock phase of maintaning a filesystem, but not the recopy phase. None of them got to /crypt0, which is a pfs in /crypt/.

Also, typing "mount" to see what's mounted triggers either the kernel debugger or a reboot.

Actions

Copy link

Updated by phma over 13 years ago

Status changed from New to In Progress

I got a dump! I switched to the text console a few minutes before the periodic job was scheduled to start. A few minutes later, I got the kernel debugger. I called dumpsys and rebooted. It said "LK_RELEASE: no lock held." I'll pass it to Matt as soon as I can get a hold of him.

Actions

Copy link

Updated by vsrinivas over 13 years ago

Could you post the backtrace from core.txt or from a kgdb session on the kernel dump? That'd be pretty useful to tracking down the errant LK_RELEASE.

Thanks!

Actions

Copy link

Updated by vsrinivas over 13 years ago

Okay; the backtrace here:

panic: lockmgr: LK_RELEASE: no lock held
cpuid = 1
Trace beginning at frame 0xd35e6a44
panic(ffffffff,1,c06d3d58,d35e6a78,cc99f560) at panic+0x19e 0xc03839b3
panic(c06d3d58,ccb86be0,d35e6af0,ccb86e04,d35e6a9c) at panic+0x19e 0xc03839b3
lockmgr(d9f18eb0,6,d35e6ab8,c03ec45f,d9f18dd8) at lockmgr+0x4d0 0xc037550a
vn_unlock(d9f18dd8,ccb86be0,d35e6b24,c03e6d80,d9f18dd8) at vn_unlock+0x1e 0xc03f81cd
vput(d9f18dd8,d9f18dd8,c03d7fd1,c304d598,c2e881d8) at vput+0x12 0xc03ec45f
nlookup(d35e6c70,c03e73b6,ce173f68,d35e6c78,d08a02e0) at nlookup+0x5fa 0xc03e6d80
kern_statfs(d35e6c70,d35e6b70,0,1,8c) at kern_statfs+0x29 0xc03f1bac
sys_statfs(d35e6cf0,d35e6d00,8,0,0) at sys_statfs+0x52 0xc03f1cf4
syscall2(d35e6d40) at syscall2+0x270 0xc065e28c
Xint0x80_syscall() at Xint0x80_syscall+0x36 0xc062d2d6
Debugger("panic")
;

We were finding an unlocked vnode of VT_PROCFS; it turned out to be the root vnode from linprocfs.

linprocfs_allocvp contained a vx_unlock() of the vnode it had just allocated in the new-vnode case; this should be removed. (linprocfs_subr.c:223); it should be returned a locked vnode here. If not, the root vnode was not locked when we were in the namecache.

linprocfs also should use a vhold_interlock() around its vget() loop, and perhaps shouldn't synchronize with the pfs_token. It is likely still racy; probably the same fixes done to procfs this release cycle should be applied to it?

Actions

Copy link

Updated by vsrinivas over 13 years ago

Any update on this?

Actions

Copy link

Updated by phma over 13 years ago

I had forgotten to leave linprocfs mounted while hammer runs. I have verified that typing "mount" when linprocfs is mounted no longer causes a problem. It is mounted now; I'll let you know what happens tonight.

I have, since the fix, seen the computer reboot for no apparent reason when I tried to log on to an X session as another user. Any idea why?

Actions

Copy link

#10