large ncpus / memory support going in, HEADS UP - master may have some instability
Matthew Dillon
dillon at apollo.backplane.com
Mon Dec 20 12:05:06 PST 2010
:How do these scheduler changes relate to say, userland preemption or
:things of that nature (thinking soft-realtime, scheduling priority, etc) ?
:
:I know in the past in general you've said thats not really a design goal
:- but, anyhow, seems to relate based on the commit messages
:
:cheers
:
:- Chris
We're not going to be doing soft realtime though there is nothing
preventing someone from simply dedicating specific cpus to a process
(which the userland scheduler can already do). Well, I should say
that *I* am not going to be doing soft realtime, but remember we have
a pluggable scheduler framework so it would be possible to dedicate
a subset of cpus to a different scheduler mechanic if someone wanted
to do that work.
The current scheduler changes are mostly trying to preserve the existing
pecking order. That is, when a user process with a high dynamic priority
gets scheduled it is supposed to immediately preempt a user process with
a lower dynamic priority if no cpus are otherwise available. Doing
this correctly is a lot harder than it sounds.
The biggest issue with a 48-core was dealing with token contention
causing havoc with inter-cpu cache management messages. The latest
commit addresses the issue by having the LWKT scheduler on a given
cpu resequence its spinning/polling vs other LWKT schedulers on other
cpus by using a FIFO counter mechanic with atomic_fetchadd_int() and
MONITOR/MWAIT. So far it seems to work extremely well when dealing
with situations where substantially all the cpus are contending
on the same token.
Right now the limitation I am hitting on the parallel buildworld on
the 48-core is the VM system (vm_token). I am reviewing the code and
determining how best to make it more fine-grained so the processes
do not compete with each other during VM faults.
p.s. supermicro support is coming along well. the if_igb driver is
working fairly well now in polling mode. What we need is a MPS disk
driver port from FreeBSD to get access to the supermicro's LSI Raid
chips & SATA ports. The AMD AHCI SATA ports are accessible via our
AHCI driver. Stability is very good. My make -j 40 buildworld loop
has been running continuously with no problems since my last bug-fix
commit. Of course we do not support NUMA yet so the memory use is
not localized to the cpus (yet), but that is only a big issue for a
subset of problems which blow the opteron's monster L3 cache. Such as,
say, raytracing.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Users
mailing list