large ncpus / memory support going in, HEADS UP - master may have some instability

Matthew Dillon dillon at apollo.backplane.com
Mon Dec 20 12:05:06 PST 2010


:How do these scheduler changes relate to say, userland preemption or 
:things of that nature (thinking soft-realtime, scheduling priority, etc) ?
:
:I know in the past in general you've said thats not really a design goal 
:- but, anyhow, seems to relate based on the commit messages
:
:cheers
:
:- Chris

    We're not going to be doing soft realtime though there is nothing
    preventing someone from simply dedicating specific cpus to a process
    (which the userland scheduler can already do).  Well, I should say
    that *I* am not going to be doing soft realtime, but remember we have
    a pluggable scheduler framework so it would be possible to dedicate
    a subset of cpus to a different scheduler mechanic if someone wanted
    to do that work.

    The current scheduler changes are mostly trying to preserve the existing
    pecking order.  That is, when a user process with a high dynamic priority
    gets scheduled it is supposed to immediately preempt a user process with
    a lower dynamic priority if no cpus are otherwise available.  Doing
    this correctly is a lot harder than it sounds.

    The biggest issue with a 48-core was dealing with token contention
    causing havoc with inter-cpu cache management messages.  The latest
    commit addresses the issue by having the LWKT scheduler on a given
    cpu resequence its spinning/polling vs other LWKT schedulers on other
    cpus by using a FIFO counter mechanic with atomic_fetchadd_int() and
    MONITOR/MWAIT.  So far it seems to work extremely well when dealing
    with situations where substantially all the cpus are contending
    on the same token.

    Right now the limitation I am hitting on the parallel buildworld on
    the 48-core is the VM system (vm_token).  I am reviewing the code and
    determining how best to make it more fine-grained so the processes
    do not compete with each other during VM faults.

    p.s. supermicro support is coming along well.  the if_igb driver is
    working fairly well now in polling mode.  What we need is a MPS disk
    driver port from FreeBSD to get access to the supermicro's LSI Raid
    chips & SATA ports.  The AMD AHCI SATA ports are accessible via our
    AHCI driver.  Stability is very good.  My make -j 40 buildworld loop
    has been running continuously with no problems since my last bug-fix
    commit.  Of course we do not support NUMA yet so the memory use is
    not localized to the cpus (yet), but that is only a big issue for a
    subset of problems which blow the opteron's monster L3 cache.  Such as,
    say, raytracing.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Users mailing list