Load balancing between CPUs

Tue May 2 19:35:20 PDT 2006

:Hello,
:
:As a part of my lottery scheduler work, I'm trying to figure out
:how to implement load balancing between CPUs.
:
:As I understand it, the goal is equal load of all CPUs. Periodically
:we obtain number of ready-to-execute processes on each CPU (per-cpu
:runqueue length) and 'push' processes from high-loaded CPUs to 
:low-loaded. This scheme is quite simple. The problem is CPU affinity,
:which we didn't count. 
:
:Here is what I'm thinking about:
:	- when new process created via fork(), usched 'pushed' it
:	  to the first most low-loaded CPU;

    You will want to run it on the same cpu as the parent initially.
    Then just let it migrate naturally, or push actively when it does
    an exec().

    The reason is that the system does not know as of when the fork()
    is issued whether the thread is forking a new instance or
    whether it is doing a fork/exec to run another program.  If you
    actively push the newly forked process to another cpu, you
    instantly blow the caches on both cpus.

:	- add to the struct lwp field 'lastcpu' to track the CPU
:	  on which this lwp was scheduled last time;

    That is probably best left to the per-lwp scheduler data
    (the lwp_usdata pointer).  The current bsd4 scheduler stores
    its last cpu field there already.

:	- followed Matt's mail on Sat, 26 Nov 2005, add to the 
:	  struct lwp field 'desiredcpu', which can be setup by
:	  usched's setcpumask();
:	- balancing scheme: first we try to find on most high-loaded 
:	  CPU lwp with 'desiredcpu' equal to most low-loaded and then 
:	  'pushed' it on success. If we can't, doing the same but 
:	  with 'lastcpu' field. If both previous are failed, pick up
:	  a random lwp and 'push' it to the most low-loaded CPU, breaking
:	  cpu affinity.. Do this step until runqueues on both CPUs
:	  will be equal (+/- 1).

    It is probably best to just schedule the thread on the last cpu
    it ran on, and then let the scheduler move the thread to another
    cpu if it doesn't get cpu 'quickly enough' (whatever we define that
    to mean).

:With this scheme we balance only 2 CPUs at one balancing cycle. Maybe 
:we need to do this for more than one CPU pair..

    Doing the migration out of band also makes balancing across multiple
    cpus easier, since you can construct a dedicated algorithm to do
    the balancing which has access to a lot more statistical information
    rather than attempting to figure it out when the thread is scheduled
    to run.

    The reason you want to do this is because it is virtually impossible
    to characterize how much cpu the target thread will need before it
    goes to sleep again.  If you schedule the target thread to another
    cpu when it would otherwise only require a few cycles of cpu before
    going to sleep again, you again blow both cpu's caches and add a lot
    of overhead that winds up being wasted.

    For example, when the scheduler does a normal thread switch and must
    select the 'next' thread to run on the current cpu, it could also
    check other threads in the run queue that haven't been able to run
    yet and determine whether it makes sense to migrate some of them.

    Alternatively, one could have a 'helper' thread that runs a certain
    number of times a second, like 10 times a second, on each cpu, which
    implements a more sophisticated balancing algorithm.  The advantage of
    this is that the algorithm then takes a small fixed amount of cpu that
    is entirely independant of the actual machine load, and you do not
    clutter the normal 'critical' switching path with all sorts of code.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>

:Eventually, we will have (I hope) fully-supported 1:1 threading. So,
:maybe we need to do some effort on scheduling different threads of
:one multithreaded process on different CPUs?
:
:Feedback and comments are appreciated.
:
:--
:Sergey Glushchenko