Interactive performance regression since 2.0 (patch #2)
Ok, here's a new patch to try:
This turned into a three-day marathon run but I'm really happy with
The old scheduler had a huge number of issues. I can't count the number
of issues I came across. I had to substantially rewrite the logic.
The new user scheduler does not depend on the helper threads when
switching under load. It uses a very cool algorithm whereby threads
trying to return back into user mode 'bid' for the single user mode
slot (per-cpu). So if thread A bids the best priority so far,
then switches to thread B (also trying to exit to user mode) which
bids a better priority, thread B will directly deschedule thread A
and take over the bid. The scheduler helper is only used to kick
idle cpus to pick up the threads that lost the bid. This results in
highly optimal operation.
I also found an issue with MP lock contention on SMP boxes. If the
LWKT scheduler could not acquire the MP lock it would continue looking
for threads that didn't need it. Sounds great, but what was happening
was that the scheduler was skipping important kernel threads needing
the lock and scheduling cpu-intensive user threads, and then not
rescheduling for a whole tick. I created two fixes for this issue
controllable live with a sysctl.
lwkt.chain_mplock=0 Causes the LWKT scheduler to refuse to
schedule a user thread if kernel threads
exist which need the MP lock but couldn't
lwkt.chain_mplock=1 Causes the LWKT scheduler to schedule the
user thread but adds logic to rel_mplock()
to attempt to IPI some other cpu needing
the MP lock, creating an event that allows
that other cpu to reschedule out of the user
I intend to commit this on Friday baring problems, so please test both
So far on my desktop I have tested lwkt.chain_mplock=0 with excellent
#4 Updated by dillon about 6 years ago
I am seeing better cpu resource use too. My buildworld -j 8 loop
went from 2760 seconds to 2560 seconds, with chain_mplock set to 0.
I am testeing with chain_mplock set to 1 right now.
BUILD 85 Thu Dec 4 10:56:02 PST 2008 -j 8
2765.91 real 3316.81 user 472.50 sys
BUILD 14 Wed Dec 17 08:21:39 PST 2008 -j 8
2570.78 real 3198.75 user 443.62 sys
The commit is slated for Friday. I am going to get the patch running
on leaf and pkgbox today as well.
#6 Updated by erik-wikstrom about 6 years ago
Did I get this right? Lets say that thread A wants to go back to user
mode, and since no other threads are running it does that. Then, right
after, thread B also wants to go back to user mode, and since it has
higher priority than thread A it will preempt thread A and start running
instead? And this is just for the case where a thread returns to user
mode, for threads already in user mode normal scheduling will occur?
#7 Updated by dillon about 6 years ago
:Did I get this right? Lets say that thread A wants to go back to user
:mode, and since no other threads are running it does that. Then, right
:after, thread B also wants to go back to user mode, and since it has
:higher priority than thread A it will preempt thread A and start running
:instead? And this is just for the case where a thread returns to user
:mode, for threads already in user mode normal scheduling will occur?
Let me explain it more fully.
LWKT - The core light weight thread scheduler used by DragonFly.
Handles the actual running of all execution contexts,
kernel or user.
USCHED - The user mode thread scheduler controls which one of N
user mode threads is allowed on the LWKT queue for any
given cpu at any given moment.
Every thread in the system will be running in either user mode or
kernel mode. Pure kernel threads always only run in kernel mode.
For any given cpu only ONE LWKT thread may be scheduled for user
mode operation. All remaining runnable LWKT threads will be running
in kernel mode.
So if you have, say, five threads running in user mode on one cpu four
of those will be held by the USCHED_BSD4 scheduler on its run queue
and only one will be held by the LWKT scheduler (the one actually
running). The USCHED_BSD4 schedule chooses which one of N threads that
want to run in user mode actually gets to run at any given point in
time on a particular cpu.
Any user mode thread can only block or be switched away while in
kernel mode. So, for example, if a user thread makes a system call
it will now be in kernel mode and no longer subject to USCHED_BSD4,
and USCHED_BSD4 will in fact be able to schedule another user mode
thread while the first one is sitting in kernel mode handling the
system call. Thus any user thread which blocks while making a system
call does so in kernel mode and does not effect USCHED's ability to
schedule another user mode thread as the 'one' user mode thread running
on any given cpu.
Ok, so what this maens is that from USCHED's point of view there is
only the 'current' user mode thread scheduled by LWKT, the usre mode
threads that previously tried to return from kernel mode back to user
mode but couldn't, which are sitting in USCHED's run queue, AND then
there are the threads runnin in kernel mode that are trying to return
to user mode. For example, if you hit a key in a ssh client the ssh
program was sitting blocked in select() in kernel mode waiting for the
key, and now that it has it is trying to return from the select() system
call back into user mode.
The USCHED algorithm centers on these kernel mode threads that are
trying to return to user mode. Kernel mode threads always have priority
over any thread already in user mode (of which only one can be schedule
to LWKT at any time).
So here is what happens: A kernel mode thread is trying to return to
user mode so it checks its dynamic user priority against the currently
scheduled user mode thread on that cpu. If its dynamic user priority
is better then the currently scheduled user mode thread then it takes
over and becomes the currently scheduled user mode thread. It
deschedules the OLD user mode thread from LWKT and returns it to the
USCHED run queue. However, this thread does not immediately return
to user mode. If there are still other runnable LWKT threads present
(running in kernel mode), it switches to them first, giving them a
chance to try to return to user mode and possibly take over the
'currently scheduled user mode thread' designation for that cpu.
Once all other LWKT threads running in kernel mode are exhausted the
last LWKT thread holding the bag will be the ONE user mode LWKT thread
left, and that thread returns to user mode. Any other user mode threads
that tried to return to user mode but couldn't will be sitting on the
USCHED run queue waiting for their turn.
Certain events will cause the current user mode thread to get interrupted.
For example, a scheduler clock tick or if some kernel mode LWKT thread
becomes scheduled. The current user mode LWKT thread will get
interrupted and enter kernel mode. If the event is a reschedule request
this thread will check the USCHED run queue to determine if it is still
the best user mode thread and if not it will deschedule itself, place
itself back onto the USCHED run queue, and schedule the best
thread from the run queue to take its place.
Similarly, as another example, if an interactive program sitting blocked
in the kernel waiting on an event gets woken up that will cause the
currently running user mode thread to take a AST fault with a LWKT
reschedule request. It will enter the kernel and see that there are LWKT
kernel threads present, and switch. If the target LWKT thread then tries
to return to user mode and has a better priority (which it will because
it is an interactive program), it deschedules the user mode thread that
was running and puts itself in the driver's seat so it can return to
user mode itself. The original user mode thread is handed back to USCHED.