Bug #714

SMP kernel panic at boot: assertion: ((int)sr->sysid ..

Added by thomas.nikolajsen over 7 years ago. Updated over 7 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Using HEAD I get this panic on every boot with SMP kernel;
this on pentium 4 w/ HTT.
A few months ago I had no problem using same KERNCONF on this host.
Same KERNCONF works on AMD64 (one core).

Commit below introduced KKASSERT causing panic:
http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html

-thomas

dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid (1.88 KB) thomas.nikolajsen, 06/28/2007 10:37 PM

dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.2 (3.87 KB) thomas.nikolajsen, 06/29/2007 10:24 PM

dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.3 (2.08 KB) thomas.nikolajsen, 07/02/2007 01:43 AM

History

#1 Updated by dillon over 7 years ago

:New submission from Thomas Nikolajsen <>:
:
:Using HEAD I get this panic on every boot with SMP kernel;
:this on pentium 4 w/ HTT.
:A few months ago I had no problem using same KERNCONF on this host.
:Same KERNCONF works on AMD64 (one core).
:
:Commit below introduced KKASSERT causing panic:
:http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html
:
: -thomas

Woa. That's really odd, it shouldn't be possible for that to happen.
Are you sure you have the latest HEAD?

When it panics please do this from the db> prompt:

print *ncpus_fit
print *ncpus_fit_mask

-Matt

#2 Updated by thomas.nikolajsen over 7 years ago

Well, it is full build of HEAD from June 27th.
Updated source & rebuild today, but it didn't change panic.

Anyway I guess it is a good idea you did put in ASSERT :)

-thomas
-
db> print *ncpus_fit
2
db> print *ncpus_fit_mask
1

#3 Updated by thomas.nikolajsen over 7 years ago

I looked into this: sprinkling kprintf's around:
gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.

It turns out that problem is that ncpus isn't fixed for SMP:
it's initialized to 1 and later changed to number of CPUs.
(in /sys/platform/pc32/i386/mp_machdep.c)

sysref_ctor is called once while ncpus==1.

Simple fix (hack?) is to initialize ncpus to MAXCPU,
and set ncpus* accordingly.

-thomas

#4 Updated by dillon over 7 years ago

:Thomas Nikolajsen <> added the comment:
:
:I looked into this: sprinkling kprintf's around:
:gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.
:
:It turns out that problem is that ncpus isn't fixed for SMP:
:it's initialized to 1 and later changed to number of CPUs.
:(in /sys/platform/pc32/i386/mp_machdep.c)
:
:sysref_ctor is called once while ncpus==1.
:
:Simple fix (hack?) is to initialize ncpus to MAXCPU,
:and set ncpus* accordingly.
:
: -thomas

Ok. Lets find out where this is. Add a conditional that
checks for ncpus == 1 in sysref_ctor() and call db_print_backtrace();
along with your kprintf. Tell me what it says! e.g.

sysref_ctor(...)
{
if (ncpus == 1) {
kprintf("ncpus is one!!!!\n");
db_print_backtrace();
/* Debugger("blah"); OPTIONAL (continue booting with 'cont') */
}
}

I'm still coming up blanks. I added a check for ncpus == 1 in
sysref_ctor() on HEAD on my test box and it never gets hit. Maybe
your cvs repository is out of date or something... try cvsup'ing
directly from the master site or maybe even clean it all out and
cvsup a fresh copy from the master site.

-Matt

#5 Updated by thomas.nikolajsen over 7 years ago

DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
so I used 'nm -n' on kernel and added symbols to trace myself.
perfmon popped up; after removing that from my KERNCONF problem isn't seen.

For a solution we could just use MAXCPU rounded up to nearest power of 2,
to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).

It will give fewer sysids than current scheme with ncpus << MAXCPU,
but MAXCPU has to be supported anyway.

-thomas

#6 Updated by dillon over 7 years ago

:Thomas Nikolajsen <> added the comment:
:
:DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
:so I used 'nm -n' on kernel and added symbols to trace myself.
:perfmon popped up; after removing that from my KERNCONF problem isn't seen.
:
:For a solution we could just use MAXCPU rounded up to nearest power of 2,
:to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).
:
:It will give fewer sysids than current scheme with ncpus << MAXCPU,
:but MAXCPU has to be supported anyway.
:
: -thomas

This is easy. It's due to PERFMON trying to initialize its devices
way, way too early.

Fix coming up in a sec.

-Matt

#7 Updated by thomas.nikolajsen over 7 years ago

Ah, nice.

Btw: I did check my source tree for bit rot; didn't find any.
Did compare with virgin checkout from local cvs-repo. and primary.

-thomas

Also available in: Atom PDF