cvs commit: src/sys/conf files src/sys/ddb db_ps.c src/sys/i386/i386 trap.c src/sys/kern init_main.c kern_synch.c kern_usched.c lwkt_thread.c usched_bsd4.c usched_dummy.c src/sys/sys globaldata.h thread.h usched.h

Tue May 30 15:53:10 PDT 2006

On 5/31/06, Alex Burke <alexjeffburke at xxxxxxxxx> wrote:
> :While I am here, can the schedulers allow userspace to specify a cpumask
> :a thread is willing to run on, some highend software e.g database
> :systems may want this feature, as they are intend to manager their
> :CPU locality like they did for RAW device.
Hi,

It is quite possible that I misunderstand something here, so I
apologise if this is a stupid question, but:
I thought one of the goals of DragonFlyBSD (one which from what I have
read is very novel, and seems to make much logical sense) is that
threads are not moved between CPUs, and that this is *UNLIKE* FreeBSD
5+ series which do allow threads to move between CPUs. That is the
main use of LWKT and provides the ability for lockless algorithms to
be used.
Therefore, I guess what I do not understand is how giving a process a
CPU mask that would allow it (e.g. on a quad core machine) to run on
say two processors would then translate into the DFly model - does it
mean that if the process has two LWPs, their associated kernel threads
would be split between the CPUs under their respective LWKT
shcedulers? Or am I somehow confusing kernel space and user space? How
would a single process with a single associated LWP and thus kernel
thread (which I believe have a  1:1 relationship) benefit from a CPU
mask specifying two or more CPUs to process on?
It could be exempt from the load-balancing efforts of a userland
scheduler which cannot reasonably predict its workload in the future,
while the process, often enough, can. So the userland scheduler won't
work itself into a corner by moving a thread that's not working much
onto a less loaded CPU, then noticing that thread suddenly spring to
life and having to re-balance everything again. If the database knows
it will have roughly equal load on N threads, it can bind them very
evenly to the available CPUs, and the scheduler will not be allowed to
migrate those threads.
Since any migration of a thread involves negating the processor caches
AND probably even a bus locked cycle (I don't know, not an SMP
hacker), it ends up being very expensive for the entire system, and
gets tremendously worse on massively parallel machines with bad
interconnects (e.g. a quad Xeon). Avoiding unecessary migration (i.e.
any time it won't actually achieve anything in the long run) results
in overall better performance. If the database/whatever uses
inter-thread communication it will also perform better by having the
threads actually work in parallel when they should, instead of picking
up their next message on the next time slice on the same processor.
It's a typical issue - if the userland program knows how it will work
and how to leverage that for best performance, in general, it *should*
help the kernel understand it. This is relevant to paging, caching,
scheduling, IO event aggregation (i.e. waiting to get more events in a
kqueue call and thus save system calls), everything. However, not all
of the facilities are there to notify the kernel of what shortcuts it
can take to assist the process, and those that are there often aren't
portable. So the kernel has to make a heroic (and fundamentally
impractical) effort to predict the behavior of the process, which
still doesn't predict how it will use the data it's given, and so
on...
In a perfect world we'd have a perfectly standard API set which *does*
include these hinting types across all worthy platforms, and enough
kernels to use these properly, and the performance of our machines
would be measurably, often visibly higher.
 -- Dmitri Nikulin