Bug #1634

panic: spin_lock: 0xe4ad1320, indefinitive wait!

Added by elekktretterr about 5 years ago. Updated almost 5 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Our dragonfly box has got a panic last night. it runs master from last week.

I wrote down some text from the panic:

mp_lock = 00000002; cpuid = 2

Trace beginning at frame 0xe42d3b88

panic(...)
panic(...)
exponential_backoff(ff010000, 1 ,3c, e4ad1320, c727dbb)
spin_lock_wr_contested(e4ad1320, 0000000)
crfree(e4ad12b0,2, e42f3cc4, c00ef73, e42f3c00)
nlookup_done(e42f3c00, e4960720, c0304fe3, 0, c0384b5c)
sys_lstat(e32f3cf0,6,0,0dd9fce18)
syscall2(e42f3d440)

Whoever can understand this, can you tell me what happened and how can
this be fixed?

Also, is there any way to automate this: when panic occurs, automatically
start a memory dump process and then automatically reboot? I've been going
to the data center way too many times recently to investigate why the box
has crashed and it would be very helpful.

Petr

History

#1 Updated by swildner about 5 years ago

Am 22.12.2009 02:17, schrieb :
> Also, is there any way to automate this: when panic occurs, automatically
> start a memory dump process and then automatically reboot? I've been going
> to the data center way too many times recently to investigate why the box
> has crashed and it would be very helpful.

Check out the DDB_UNATTENDED kernel option.

Sascha

#2 Updated by ahuete.devel about 5 years ago

Hi Petr,

Could you please upload the dump to somewhere where we can grab it?

Thanks,
Antonio Huete

2009/12/22 <>:
> Our dragonfly box has got a panic last night. it runs master from last week.
>
> I wrote down some text from the panic:
>
> mp_lock = 00000002; cpuid = 2
>
> Trace beginning at frame  0xe42d3b88
>
> panic(...)
> panic(...)
> exponential_backoff(ff010000, 1 ,3c, e4ad1320, c727dbb)
> spin_lock_wr_contested(e4ad1320, 0000000)
> crfree(e4ad12b0,2, e42f3cc4, c00ef73, e42f3c00)
> nlookup_done(e42f3c00, e4960720, c0304fe3, 0, c0384b5c)
> sys_lstat(e32f3cf0,6,0,0dd9fce18)
> syscall2(e42f3d440)
>
> Whoever can understand this, can you tell me what happened and how can
> this be fixed?
>
> Also, is there any way to automate this: when panic occurs, automatically
> start a memory dump process and then automatically reboot? I've been going
> to the data center way too many times recently to investigate why the box
> has crashed and it would be very helpful.
>
> Petr
>
>

#3 Updated by elekktretterr about 5 years ago

> Hi Petr,
>
> Could you please upload the dump to somewhere where we can grab it?
>

Hi Antonio,

Unfortunately I dont have a dump. when I got to the server and plugged in
a keyboard, it wasnt responding :(

#4 Updated by qhwt+dfly about 5 years ago

On Tue, Dec 22, 2009 at 12:17:49PM +1100, wrote:
> Our dragonfly box has got a panic last night. it runs master from last week.

The ouput from `uname -v' contains the version of the source tree
you compiled the kernel from. It might be helpful to determine
if the problem has already been solved in the newer versions.

Cheers.

#5 Updated by dillon almost 5 years ago

In the release code the spinlock is very short duration. All I
can think of is that there might be a MP race somewhere. There
are a few places in the release kernel where p_ucred is used as
if it were MPSAFE when, in fact, it was not MPSAFE.

Well, that spin lock was removed from master and the cred handling
code was rewritten.

-Matt
Matthew Dillon
<>

#6 Updated by elekktretterr almost 5 years ago

> The ouput from `uname -v' contains the version of the source tree
> you compiled the kernel from. It might be helpful to determine
> if the problem has already been solved in the newer versions.

It's master from last wednesday I believe. Couple of days later there is
this commit:

http://leaf.dragonflybsd.org/mailarchive/commits/2009-12/msg00133.html

So maybe it has been fixed. We'll have to wait and see I guess.

#7 Updated by ahuete.devel almost 5 years ago

Hi Petr,

As far as I know you don't have to recompile with DDB_UNATTENDED, just
change the sysctl debug.debugger_on_panic to 0.
Note that you need to have a correctly set dumpdev (which the
installer sets now in 7etc/rc.conf to your swap device) in order to
get the dumps.

With that, once you get a panic, there will be a dump and the machine
should be restarted without intervention.

Cheers,
Antonio Huete

2009/12/23 <>:
>> Petr,
>>
>> I guess you have dumpdev configured in the server, no? Since first of
>> december we got minidumps that will produce quite small dumps despite
>> the memory you have on the machine. You will find the cores on
>> /var/crash as usual, so next time maybe we can catch the panic :)
>>
>
> There is no keyboard permanently attached to the server though. If it
> panics, I need to plugin one, but then I cant type in the ddb screen so I
> can only press the reset button. Which is why I was asking, is there any
> way to automate this? ie. 1) System panics and gets into DDB, 2) a dump is
> automatically created 3) System automatically reboots.
>
> I've been notified there is DDB_UNATTENDED kernel option, but will this
> create a dump before rebooting?
>
> Petr
>
>
>

Also available in: Atom PDF