Bug #714
closedSMP kernel panic at boot: assertion: ((int)sr->sysid ..
0%
Description
Using HEAD I get this panic on every boot with SMP kernel;
this on pentium 4 w/ HTT.
A few months ago I had no problem using same KERNCONF on this host.
Same KERNCONF works on AMD64 (one core).
Commit below introduced KKASSERT causing panic:
http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html
-thomas
Files
Updated by dillon over 17 years ago
:New submission from Thomas Nikolajsen <thomas.nikolajsen@mail.dk>:
:
:Using HEAD I get this panic on every boot with SMP kernel;
:this on pentium 4 w/ HTT.
:A few months ago I had no problem using same KERNCONF on this host.
:Same KERNCONF works on AMD64 (one core).
:
:Commit below introduced KKASSERT causing panic:
:http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html
:
: -thomas
Woa. That's really odd, it shouldn't be possible for that to happen.
Are you sure you have the latest HEAD?
When it panics please do this from the db> prompt:
print *ncpus_fit
print *ncpus_fit_mask
-Matt
Updated by thomas.nikolajsen over 17 years ago
Well, it is full build of HEAD from June 27th.
Updated source & rebuild today, but it didn't change panic.
Anyway I guess it is a good idea you did put in ASSERT :)
thomas
db> print *ncpus_fit
2
db> print *ncpus_fit_mask
1
Updated by thomas.nikolajsen over 17 years ago
I looked into this: sprinkling kprintf's around:
gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.
It turns out that problem is that ncpus isn't fixed for SMP:
it's initialized to 1 and later changed to number of CPUs.
(in /sys/platform/pc32/i386/mp_machdep.c)
sysref_ctor is called once while ncpus==1.
Simple fix (hack?) is to initialize ncpus to MAXCPU,
and set ncpus* accordingly.
-thomas
Updated by dillon over 17 years ago
:Thomas Nikolajsen <thomas.nikolajsen@mail.dk> added the comment:
:
:I looked into this: sprinkling kprintf's around:
:gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.
:
:It turns out that problem is that ncpus isn't fixed for SMP:
:it's initialized to 1 and later changed to number of CPUs.
:(in /sys/platform/pc32/i386/mp_machdep.c)
:
:sysref_ctor is called once while ncpus==1.
:
:Simple fix (hack?) is to initialize ncpus to MAXCPU,
:and set ncpus* accordingly.
:
: -thomas
Ok. Lets find out where this is. Add a conditional that
checks for ncpus 1 in sysref_ctor() and call db_print_backtrace();
along with your kprintf. Tell me what it says! e.g.
sysref_ctor(...)
{
if (ncpus 1) {
kprintf("ncpus is one!!!!\n");
db_print_backtrace();
/* Debugger("blah"); OPTIONAL (continue booting with 'cont') */
}
}
I'm still coming up blanks. I added a check for ncpus == 1 in
sysref_ctor() on HEAD on my test box and it never gets hit. Maybe
your cvs repository is out of date or something... try cvsup'ing
directly from the master site or maybe even clean it all out and
cvsup a fresh copy from the master site.
-Matt
Updated by thomas.nikolajsen over 17 years ago
DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
so I used 'nm -n' on kernel and added symbols to trace myself.
perfmon popped up; after removing that from my KERNCONF problem isn't seen.
For a solution we could just use MAXCPU rounded up to nearest power of 2,
to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).
It will give fewer sysids than current scheme with ncpus << MAXCPU,
but MAXCPU has to be supported anyway.
-thomas
Updated by dillon over 17 years ago
:Thomas Nikolajsen <thomas.nikolajsen@mail.dk> added the comment:
:
:DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
:so I used 'nm -n' on kernel and added symbols to trace myself.
:perfmon popped up; after removing that from my KERNCONF problem isn't seen.
:
:For a solution we could just use MAXCPU rounded up to nearest power of 2,
:to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).
:
:It will give fewer sysids than current scheme with ncpus << MAXCPU,
:but MAXCPU has to be supported anyway.
:
: -thomas
This is easy. It's due to PERFMON trying to initialize its devices
way, way too early.
Fix coming up in a sec.
-Matt
Updated by thomas.nikolajsen over 17 years ago
Ah, nice.
Btw: I did check my source tree for bit rot; didn't find any.
Did compare with virgin checkout from local cvs-repo. and primary.
-thomas