Bug #774

mp_lock related panic (it seems)

Added by rumcic about 7 years ago. Updated almost 7 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

http://www.rumko.net/dragonfly/DSC00129.JPG

because after a reboot savecore says there were no core dumps, I can only
provide this picture for now.

It does not occur on every reboot (after the panic, the next time it booted -
and after fsck-ing all partitions -, the machine booted up properly).

How should I start debugging this?
--
Regards,
Rumko

History

#1 Updated by dillon about 7 years ago

:http://www.rumko.net/dragonfly/DSC00129.JPG
:
:because after a reboot savecore says there were no core dumps, I can only
:provide this picture for now.
:
:It does not occur on every reboot (after the panic, the next time it booted -
:and after fsck-ing all partitions -, the machine booted up properly).
:
:How should I start debugging this?
:--
:Regards,
:Rumko

It will usually only say there is no core dump if the kernel binary
doesn't match the saved core or if it is checking for the core in the
wrong place.

When it happens again, try using 'savecore -f /var/crash' after
rebooting to try to force it to generate a core. If you are booting
from a different kernel binary then /kernel, use the -N option (do a
man savecore) to specify the location of the kernel binary.

-Matt
Matthew Dillon
<>

#2 Updated by rumcic almost 7 years ago

Finally was able to get a usable core dump (when forcing the core dump I always
got a 0B vmcore) with latest -HEAD ... it's at leaf:~rumko/crash/dumps/*.6

#3 Updated by dillon almost 7 years ago

:Rumko <> added the comment:
:
:Finally was able to get a usable core dump (when forcing the core dump I al=
:ways
:got a 0B vmcore) with latest -HEAD ... it's at leaf:~rumko/crash/dumps/*.6
:
:----------
:priority: -> bug

It died while running FAH504-Linux.exe. The only place in the trap
code that asserts the MP lock is being held is if in_vm86call is set.

I think what may be happening here is that a VM86 call is blocking
somewhere and causing a thread switch to occur. Then some other
possibly unrelated process takes an AST fault and hits that assertion,
which is made BEFORE the trap code checks whether the frame is in an
emualted VM context or not.

if (in_vm86call) { <<<<< This is a global
ASSERT_MP_LOCK_HELD(curthread); <<<<< check made too
early
if (frame->tf_eflags & PSL_VM &&
(type == T_PROTFLT || type == T_STKFLT)) {

<<<<< check should be
made down here
(and it is)

...

I think the assertion is bogus and the solution is to just remove it.
I will commit this right now and have also included the patch below.

-Matt
Matthew Dillon
<>

Index: i386/trap.c
===================================================================
RCS file: /cvs/src/sys/platform/pc32/i386/trap.c,v
retrieving revision 1.107
diff -u -p -r1.107 trap.c
--- i386/trap.c 25 Jul 2007 18:21:35 -0000 1.107
+++ i386/trap.c 7 Sep 2007 19:27:50 -0000
@@ -464,7 +464,6 @@ type = frame->tf_trapno;
code = frame->tf_err;

if (in_vm86call) {
- ASSERT_MP_LOCK_HELD(curthread);
if (frame->tf_eflags & PSL_VM &&
(type == T_PROTFLT || type == T_STKFLT)) {
#ifdef SMP

#4 Updated by rumcic almost 7 years ago

Seems to be working now.

Thank you Matt.

Also available in: Atom PDF