git: kernel - Major refactor of pageout daemon algorithms

Mon May 17 15:59:17 PDT 2021

commit e91e64c7af5788faa55682cd78c0442c83d5d6d5
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Mon May 17 13:04:56 2021 -0700

    kernel - Major refactor of pageout daemon algorithms
    
    * Rewrite a large chunk of the pageout daemon's algorithm to significantly
      improve page selection for pageout on low-memory systems.
    
    * Implement persistent markers for hold and active queue scans.  Instead
      of moving pages within the queues, we now implement a persistent marker
      and just move the marker instead.  This ensures 100% fair scanning of
      these queues.
    
    * The pageout state machine is now governed by the following sysctls
      (with some example default settings from a 32G box containing 8071042
      pages):
    
      vm.v_free_reserved: 20216
      vm.v_free_min: 40419
      vm.v_paging_wait: 80838
      vm.v_paging_start: 121257
      vm.v_paging_target1: 161676
      vm.v_paging_target2: 202095
    
      And separately
    
      vm.v_inactive_target: 484161
    
      The arrangement is as follows:
    
        reserved < severe < minimum < wait < start < target1 < target2
    
    * Paging is governed as follows:  The pageout daemon is activated when
      FREE+CACHE falls below (v_paging_start).  The daemon will free memory
      up until FREE+CACHE reaches (v_paging_target1), and then continue to
      free memory up more slowly until FREE+CACHE reaches (v_paging_target2).
    
      If, due to memory demand, FREE+CACHE falls below (v_paging_wait), most
      userland processes will begin short-stalls on VM allocations and page
      faults, and return to normal operation once FREE+CACHE goes above
      (v_paging_wait) (that is, as soon as possible).
    
      If, due to memory demand, FREE+CACHE falls below (v_paging_min), most
      userland processes will block on VM allocations and page faults until
      the level returns to above (v_paging_wait).
    
      The hysteresis between (wait) and (start) allows most processes to
      continue running normally during nominal paging activities.
    
    * The pageout daemon operates in batches and then loops as necessary.
      Pages will be moved from CACHE to FREE as necessary, then from INACTIVE
      to CACHE as necessary, then from ACTIVE to INACTIVE as necessary.  Care
      is taken to avoid completely exhausting any given queue to ensure that
      the queue scan is reasonably efficient.
    
    * The ACTIVE to INACTIVE scan has been significantly reorganized and
      integrated with the page_stats scan (which updates m->act_count for
      pages in the ACTIVE queue).  Pages in the ACTIVE queue are no longer
      moved within the lists.  Instead a persistent roving marker is employed
      for each queue.
    
      The m->act_count tests is made against a dynamically adjusted comparison
      variable called vm.pageout_stats_actcmp.  When no progress is made this
      variable is increased, and when sufficient progress is made this variable
      is decreased.  Thus, under very heavy memory loads, a more permission
      m->act_count test allows active pages to be deactivated more quickly.
    
    * The INACTIVE to FREE+CACHE scan remains relatively unchanged.  A two-pass
      LRU arrangement continues to be employed in order to give the system
      time to reclaim a deactivated page before it would otherwise get paged out.
    
    * The vm_pageout_page_stats() scan has been almost completely rewritten.
      This scan is responsible for updating m->act_count on pages in the
      ACTIVE queue.  Example sysctl settings shown below
    
      vm.pageout_stats_rsecs: 300	<--- passive run time (seconds) after pageout
      vm.pageout_stats_scan: 472	<--- max number of pages to scan per tick
      vm.pageout_stats_ticks: 10	<--- poll rate in ticks
      vm.pageout_stats_inamin: 16	<--- inactive ratio governing dynamic
      vm.pageout_stats_inalim: 4096	     adjustment of actcmnp.
      vm.pageout_stats_actcmp: 2	<--- dynamically adjusted by the kernel
    
      The page stats code polls slowly and will update m->act_count and
      deactivate pages until it is able to achieve (v_inactive_target) worth
      of pages in the inactive queue.
    
      Once this target has been reached, the poll stops deactivating pages, but
      will continue to run for (pageout_stats_rsecs) seconds after the pageout
      daemon last ran (typically 5 minutes) and continue to passively update
      m->act_count duiring this period.
    
      The polling resumes upon any pageout daemon activation and the cycle
      repeats.
    
    * The vm_pageout_page_stats() scan is mostly responsible for selecting
      the correct pages to move from ACTIVE to INACTIVE.  Choosing the correct
      pages allows the system to continue to operate smoothly while concurrent
      paging is in progress.  The additional 5 minutes of passive operation
      allows it to pre-stage m->act_count for pages in the ACTIVE queue to
      help grease the wheels for the next pageout daemon activation.
    
    				    TESTING
    
    * On a test box with memory limited to 2GB, running chrome.  Video runs
      smoothly despite constant paging.  Active tabs appear to operate smoothly.
      Inactive tabs are able to page-in decently fast and resume operation.
    
    * On a workstation with 32GB of memory and a large number of open chrome
      tabs, allowed to sit overnight (chrome burns up a lot of memory when tabs
      remain open), then video tested the next day.  Paging appeared to operate
      well and so far there has been no stuttering.
    
    * On a 64GB build box running dsynth 32/32 (intentionally overloaded).  The
      full bulk starts normally.  The packages tend to get larger and larger as
      they are built.  dsynth and the pageout daemon operate reasonably well in
      this situation.  I was mostly looking for excessive stalls due to heavy
      memory loads and it looks like the new code handles it quite well.

Summary of changes:
 sys/kern/vfs_bio.c               |   8 +-
 sys/kern/vfs_cluster.c           |   2 +-
 sys/kern/vfs_subr.c              |   2 +-
 sys/sys/vmmeter.h                |  37 ++-
 sys/vfs/ext2fs/ext2_vnops.c      |   7 +-
 sys/vfs/hammer/hammer_blockmap.c |   2 +-
 sys/vfs/hammer/hammer_inode.c    |   2 +-
 sys/vfs/msdosfs/msdosfs_denode.c |   2 +-
 sys/vfs/msdosfs/msdosfs_vnops.c  |   2 +-
 sys/vfs/tmpfs/tmpfs_subr.c       |   6 +-
 sys/vfs/tmpfs/tmpfs_vnops.c      |   8 +-
 sys/vfs/ufs/ffs_inode.c          |   2 +-
 sys/vfs/ufs/ufs_readwrite.c      |   2 +-
 sys/vm/swap_pager.c              |   2 +-
 sys/vm/vm_fault.c                |   6 +-
 sys/vm/vm_glue.c                 |   2 +-
 sys/vm/vm_meter.c                |  46 +--
 sys/vm/vm_page.c                 | 115 ++++---
 sys/vm/vm_page2.h                | 261 ++++++++++++----
 sys/vm/vm_pageout.c              | 633 +++++++++++++++++++++++++--------------
 sys/vm/vm_param.h                |  24 +-
 21 files changed, 781 insertions(+), 390 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/e91e64c7af5788faa55682cd78c0442c83d5d6d5


-- 
DragonFly BSD source repository