Issue714

Title SMP kernel panic at boot: assertion: ((int)sr->sysid ..
Priority bug Status resolved
Superseder Nosy List thomas.nikolajsen
Assigned To Keywords

Created on 2007-06-28.22:37:29 by thomas.nikolajsen, last changed by thomas.nikolajsen.

Files
File name Uploaded Type Edit Remove
dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid thomas.nikolajsen, 2007-06-28.22:37:25 application/octet-stream
dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.2 thomas.nikolajsen, 2007-06-29.22:24:34 application/octet-stream
dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.3 thomas.nikolajsen, 2007-07-02.01:43:54 application/octet-stream
Messages
msg3139 (view) Author: thomas.nikolajsen Date: 2007-07-02.02:38:51
Ah, nice.

Btw: I did check my source tree for bit rot; didn't find any.
Did compare with virgin checkout from local cvs-repo. and primary.

 -thomas
msg3138 (view) Author: dillon Date: 2007-07-02.01:56:00
:Thomas Nikolajsen <thomas.nikolajsen@mail.dk> added the comment:
:
:DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
:so I used 'nm -n' on kernel and added symbols to trace myself.
:perfmon popped up; after removing that from my KERNCONF problem isn't seen.
:
:For a solution we could just use MAXCPU rounded up to nearest power of 2,
:to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).
:
:It will give fewer sysids than current scheme with ncpus << MAXCPU,
:but MAXCPU has to be supported anyway.
:
: -thomas

    This is easy. It's due to PERFMON trying to initialize its devices
    way, way too early.

    Fix coming up in a sec.

						-Matt
msg3137 (view) Author: thomas.nikolajsen Date: 2007-07-02.01:43:54
DDB trace uploaded in file .3; symbols weren't set up yet (early in boot),
so I used 'nm -n' on kernel and added symbols to trace myself.
perfmon popped up; after removing that from my KERNCONF problem isn't seen.

For a solution we could just use MAXCPU rounded up to nearest power of 2,
to add to gd_sysref_alloc to generate new sysid (instead of adding ncpus).

It will give fewer sysids than current scheme with ncpus << MAXCPU,
but MAXCPU has to be supported anyway.

 -thomas
msg3136 (view) Author: dillon Date: 2007-07-01.04:04:00
:Thomas Nikolajsen <thomas.nikolajsen@mail.dk> added the comment:
:
:I looked into this: sprinkling kprintf's around:
:gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.
:
:It turns out that problem is that ncpus isn't fixed for SMP:
:it's initialized to 1 and later changed to number of CPUs.
:(in /sys/platform/pc32/i386/mp_machdep.c)
:
:sysref_ctor is called once while ncpus==1.
:
:Simple fix (hack?) is to initialize ncpus to MAXCPU,
:and set ncpus* accordingly.
:
: -thomas

    Ok.  Lets find out where this is.  Add a conditional that
    checks for ncpus == 1 in sysref_ctor() and call db_print_backtrace();
    along with your kprintf.   Tell me what it says! e.g.

    sysref_ctor(...)
    {
	if (ncpus == 1) {
	    kprintf("ncpus is one!!!!\n");
	    db_print_backtrace();
	    /* Debugger("blah"); OPTIONAL (continue booting with 'cont') */
	}
    }

    I'm still coming up blanks.  I added a check for ncpus == 1 in
    sysref_ctor() on HEAD on my test box and it never gets hit.  Maybe
    your cvs repository is out of date or something... try cvsup'ing 
    directly from the master site or maybe even clean it all out and
    cvsup a fresh copy from the master site.

					-Matt
msg3134 (view) Author: thomas.nikolajsen Date: 2007-07-01.03:11:50
I looked into this: sprinkling kprintf's around:
gd_sysid_alloc==1 and gd_cpuid==0 when sysres_init which panics is called.

It turns out that problem is that ncpus isn't fixed for SMP:
it's initialized to 1 and later changed to number of CPUs.
(in /sys/platform/pc32/i386/mp_machdep.c)

sysref_ctor is called once while ncpus==1.

Simple fix (hack?) is to initialize ncpus to MAXCPU,
and set ncpus* accordingly.

 -thomas
msg3121 (view) Author: thomas.nikolajsen Date: 2007-06-29.22:24:34
Well, it is full build of HEAD from June 27th.
Updated source & rebuild today, but it didn't change panic.

Anyway I guess it is a good idea you did put in ASSERT :)

 -thomas
-
db> print *ncpus_fit
       2
db> print *ncpus_fit_mask
       1
msg3117 (view) Author: dillon Date: 2007-06-29.17:31:02
:New submission from Thomas Nikolajsen <thomas.nikolajsen@mail.dk>:
:
:Using HEAD I get this panic on every boot with SMP kernel;
:this on pentium 4 w/ HTT.
:A few months ago I had no problem using same KERNCONF on this host.
:Same KERNCONF works on AMD64 (one core).
:
:Commit below introduced KKASSERT causing panic:
:http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html
:
: -thomas

    Woa.  That's really odd, it shouldn't be possible for that to happen.
    Are you sure you have the latest HEAD?

    When it panics please do this from the db> prompt:

    print *ncpus_fit
    print *ncpus_fit_mask

						-Matt
msg3114 (view) Author: thomas.nikolajsen Date: 2007-06-28.22:37:25
Using HEAD I get this panic on every boot with SMP kernel;
this on pentium 4 w/ HTT.
A few months ago I had no problem using same KERNCONF on this host.
Same KERNCONF works on AMD64 (one core).

Commit below introduced KKASSERT causing panic:
http://leaf.dragonflybsd.org/mailarchive/commits/2007-04/msg00211.html

 -thomas
History
Date User Action Args
2007-07-02 03:27:04thomas.nikolajsensetstatus: chatting -> resolved
2007-07-02 02:38:55thomas.nikolajsensetmessages: + msg3139
2007-07-02 01:56:00dillonsetmessages: + msg3138
2007-07-02 01:43:59thomas.nikolajsensetfiles: + dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.3
messages: + msg3137
2007-07-01 04:04:01dillonsetmessages: + msg3136
2007-07-01 03:11:54thomas.nikolajsensetmessages: + msg3134
2007-06-29 22:24:36thomas.nikolajsensetfiles: + dfly-1.9.0.20070627.SMP.boot.panic.assertion.sysid.2
messages: + msg3121
2007-06-29 17:31:05dillonsetstatus: unread -> chatting
messages: + msg3117
2007-06-28 22:37:29thomas.nikolajsencreate