git: kernel - Fix NUMA contention due to assymetric memory

Matthew Dillon dillon at crater.dragonflybsd.org
Sun Oct 14 20:27:26 PDT 2018


commit 8e5d7c42c16ea1b7b0545d8d6a4bf42046da03b8
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Sun Oct 14 20:09:47 2018 -0700

    kernel - Fix NUMA contention due to assymetric memory
    
    * Fix NUMA contention in situations where memory is associated
      with CPU cores assymetrically.  In particular, with the 2990WX,
      half the cores will have no memory associated with them.
    
    * This was forcing DFly to allocate memory from queues belonging to
      other nearby cores, causing unnecessary SMP contention, as well
      as burn extra time iterating queues.
    
    * Fix by calculating the average number of free pages per-core,
      and then adjust any VM page queue with pages less than the average
      by stealing pages from queues with greater than the average.
      We use a simple iterator to steal pages, so the CPUs with less
      (or zero) direct-attached memory will operate more UMA-like
      (just on 4K boundaries instead of 256-1024 byte boundaries).
    
    * Tested with a 64-thread concurrent compile test.  systat -pv 1
      showed all remaining contention disappear.  Literally, *ZERO*
      contention when we run the test with each thread in its own jail
      with no shared resources.
    
    * NOTE!  This fix is specific to asymetric NUMA configurations
      which are fairly rare in the wild and will not speed up more
      conventional systems.
    
    * Before and after timings on the 2990WX.
    
      cd /tmp/src
      time make -j 128 nativekernel NO_MODULES=TRUE > /dev/null
    
      BEFORE
      703.915u 167.605s 0:49.97 1744.0%       9993+749k 22188+8io 216pf+0w
      699.550u 171.148s 0:50.87 1711.5%       9994+749k 21066+8io 150pf+0w
    
      AFTER
      678.406u 108.857s 0:45.66 1724.1%       10105+757k 22188+8io 216pf+0w
      674.805u 115.256s 0:46.67 1692.8%       10077+755k 21066+8io 150pf+0w
    
      This is a 4.2 second difference on the second run, an over 8%
      improvement which is nothing to sneeze at.

Summary of changes:
 sys/kern/subr_cpu_topology.c         |  17 +++-
 sys/platform/pc64/acpica/acpi_srat.c |   1 +
 sys/sys/cpu_topology.h               |   2 +
 sys/vm/vm_page.c                     | 169 +++++++++++++++++++++++++++++++----
 sys/vm/vm_page.h                     |   8 +-
 5 files changed, 179 insertions(+), 18 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/8e5d7c42c16ea1b7b0545d8d6a4bf42046da03b8


-- 
DragonFly BSD source repository



More information about the Commits mailing list