Bug #1183
closedInteractive performance regression since 2.0 (patch #2)
0%
Description
Ok, here's a new patch to try:
fetch http://apollo.backplane.com/DFlyMisc/sched02.patchThis turned into a three-day marathon run but I'm really happy with
    the results.The old scheduler had a huge number of issues.  I can't count the number
    of issues I came across.  I had to substantially rewrite the logic.The new user scheduler does not depend on the helper threads when
    switching under load.  It uses a very cool algorithm whereby threads
    trying to return back into user mode 'bid' for the single user mode
    slot (per-cpu).  So if thread A bids the best priority so far,
    then switches to thread B (also trying to exit to user mode) which
    bids a better priority, thread B will directly deschedule thread A
    and take over the bid.  The scheduler helper is only used to kick
    idle cpus to pick up the threads that lost the bid.  This results in
    highly optimal operation.I also found an issue with MP lock contention on SMP boxes.  If the
    LWKT scheduler could not acquire the MP lock it would continue looking
    for threads that didn't need it.  Sounds great, but what was happening
    was that the scheduler was skipping important kernel threads needing
    the lock and scheduling cpu-intensive user threads, and then not
    rescheduling for a whole tick.  I created two fixes for this issue
    controllable live with a sysctl.lwkt.chain_mplock=0        Causes the LWKT scheduler to refuse to
                schedule a user thread if kernel threads
                exist which need the MP lock but couldn't
                get it.lwkt.chain_mplock=1        Causes the LWKT scheduler to schedule the
                user thread but adds logic to rel_mplock()
                to attempt to IPI some other cpu needing
                the MP lock, creating an event that allows
                that other cpu to reschedule out of the user
                thread.I intend to commit this on Friday baring problems, so please test both
    modes.So far on my desktop I have tested lwkt.chain_mplock=0 with excellent
    results.-Matt
                    Matthew Dillon 
                    <dillon@backplane.com>
       Updated by elekktretterr almost 17 years ago
      Updated by elekktretterr almost 17 years ago
      
    
    Hi Matt,
Just out of interest while talking about CPU schedulers. There was a project 
for the GSoC to write new scheduler using the Stride algorithm. Is this going 
to be committed?
Petr
       Updated by hasso almost 17 years ago
      Updated by hasso almost 17 years ago
      
    
    Me too. It uses available resources much better (CPU's running very close 
to 100% during make -j8 all the time) and my desktop is very responsive 
under load. Please commit. :)
Running default lwkt.chain_mplock=0 for now.
       Updated by dillon almost 17 years ago
      Updated by dillon almost 17 years ago
      
    
    :Hi Matt,
:Just out of interest while talking about CPU schedulers. There was a project 
:for the GSoC to write new scheduler using the Stride algorithm. Is this going 
:to be committed?
:
:Petr
Who was mentoring that project?   I'd like his opinion.-Matt
       Updated by dillon almost 17 years ago
      Updated by dillon almost 17 years ago
      
    
    I am seeing better cpu resource use too.  My buildworld -j 8 loop
    went from 2760 seconds to 2560 seconds, with chain_mplock set to 0.
    I am testeing with chain_mplock set to 1 right now.
BUILD 85 Thu Dec  4 10:56:02 PST 2008 -j 8
     2765.91 real      3316.81 user       472.50 sys
BUILD 14 Wed Dec 17 08:21:39 PST 2008 -j 8
     2570.78 real      3198.75 user       443.62 sys
The commit is slated for Friday.  I am going to get the patch running
    on leaf and pkgbox today as well.-Matt
                    Matthew Dillon 
                    <dillon@backplane.com>
       Updated by erik-wikstrom almost 17 years ago
      Updated by erik-wikstrom almost 17 years ago
      
    
    Did I get this right? Lets say that thread A wants to go back to user
mode, and since no other threads are running it does that. Then, right
after, thread B also wants to go back to user mode, and since it has
higher priority than thread A it will preempt thread A and start running
instead? And this is just for the case where a thread returns to user
mode, for threads already in user mode normal scheduling will occur?
       Updated by dillon almost 17 years ago
      Updated by dillon almost 17 years ago
      
    
    :Did I get this right? Lets say that thread A wants to go back to user
:mode, and since no other threads are running it does that. Then, right
:after, thread B also wants to go back to user mode, and since it has
:higher priority than thread A it will preempt thread A and start running
:instead? And this is just for the case where a thread returns to user
:mode, for threads already in user mode normal scheduling will occur?
:
:-- 
:Erik Wikström
Let me explain it more fully.LWKT   - The core light weight thread scheduler used by DragonFly.
         Handles the actual running of all execution contexts,
         kernel or user.USCHED - The user mode thread scheduler controls which one of N
         user mode threads is allowed on the LWKT queue for any
         given cpu at any given moment.Every thread in the system will be running in either user mode or
    kernel mode.  Pure kernel threads always only run in kernel mode.For any given cpu only ONE LWKT thread may be scheduled for user
    mode operation.  All remaining runnable LWKT threads will be running
    in kernel mode.So if you have, say, five threads running in user mode on one cpu four
    of those will be held by the USCHED_BSD4 scheduler on its run queue
    and only one will be held by the LWKT scheduler (the one actually
    running).  The USCHED_BSD4 schedule chooses which one of N threads that
    want to run in user mode actually gets to run at any given point in
    time on a particular cpu.Any user mode thread can only block or be switched away while in
    kernel mode.  So, for example, if a user thread makes a system call
    it will now be in kernel mode and no longer subject to USCHED_BSD4,
    and USCHED_BSD4 will in fact be able to schedule another user mode
    thread while the first one is sitting in kernel mode handling the
    system call.  Thus any user thread which blocks while making a system
    call does so in kernel mode and does not effect USCHED's ability to
    schedule another user mode thread as the 'one' user mode thread running
    on any given cpu.--Ok, so what this maens is that from USCHED's point of view there is
    only the 'current' user mode thread scheduled by LWKT, the usre mode
    threads that previously tried to return from kernel mode back to user
    mode but couldn't, which are sitting in USCHED's run queue, AND then
    there are the threads runnin in kernel mode that are trying to return
    to user mode.  For example, if you hit a key in a ssh client the ssh
    program was sitting blocked in select() in kernel mode waiting for the
    key, and now that it has it is trying to return from the select() system
    call back into user mode.The USCHED algorithm centers on these kernel mode threads that are
    trying to return to user mode.  Kernel mode threads always have priority
    over any thread already in user mode (of which only one can be schedule
    to LWKT at any time).So here is what happens:  A kernel mode thread is trying to return to
    user mode so it checks its dynamic user priority against the currently
    scheduled user mode thread on that cpu.  If its dynamic user priority
    is better then the currently scheduled user mode thread then it takes
    over and becomes the currently scheduled user mode thread.  It
    deschedules the OLD user mode thread from LWKT and returns it to the
    USCHED run queue.  However, this thread does not immediately return
    to user mode.  If there are still other runnable LWKT threads present
    (running in kernel mode), it switches to them first, giving them a
    chance to try to return to user mode and possibly take over the
    'currently scheduled user mode thread' designation for that cpu.Once all other LWKT threads running in kernel mode are exhausted the
    last LWKT thread holding the bag will be the ONE user mode LWKT thread
    left, and that thread returns to user mode.  Any other user mode threads
    that tried to return to user mode but couldn't will be sitting on the
    USCHED run queue waiting for their turn.Certain events will cause the current user mode thread to get interrupted.
    For example, a scheduler clock tick or if some kernel mode LWKT thread
    becomes scheduled.  The current user mode LWKT thread will get
    interrupted and enter kernel mode.  If the event is a reschedule request
    this thread will check the USCHED run queue to determine if it is still
    the best user mode thread and if not it will deschedule itself, place
    itself back onto the USCHED run queue, and schedule the best
    thread from the run queue to take its place.Similarly, as another example, if an interactive program sitting blocked
    in the kernel waiting on an event gets woken up that will cause the 
    currently running user mode thread to take a AST fault with a LWKT
    reschedule request.  It will enter the kernel and see that there are LWKT
    kernel threads present, and switch.  If the target LWKT thread then tries
    to return to user mode and has a better priority (which it will because
    it is an interactive program), it deschedules the user mode thread that
    was running and puts itself in the driver's seat so it can return to
    user mode itself.  The original user mode thread is handed back to USCHED.-Matt
                    Matthew Dillon 
                    <dillon@backplane.com>