git: kernel - Major refactor of pageout daemon algorithms
dillon at crater.dragonflybsd.org
Mon May 17 15:59:17 PDT 2021
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Mon May 17 13:04:56 2021 -0700
kernel - Major refactor of pageout daemon algorithms
* Rewrite a large chunk of the pageout daemon's algorithm to significantly
improve page selection for pageout on low-memory systems.
* Implement persistent markers for hold and active queue scans. Instead
of moving pages within the queues, we now implement a persistent marker
and just move the marker instead. This ensures 100% fair scanning of
* The pageout state machine is now governed by the following sysctls
(with some example default settings from a 32G box containing 8071042
The arrangement is as follows:
reserved < severe < minimum < wait < start < target1 < target2
* Paging is governed as follows: The pageout daemon is activated when
FREE+CACHE falls below (v_paging_start). The daemon will free memory
up until FREE+CACHE reaches (v_paging_target1), and then continue to
free memory up more slowly until FREE+CACHE reaches (v_paging_target2).
If, due to memory demand, FREE+CACHE falls below (v_paging_wait), most
userland processes will begin short-stalls on VM allocations and page
faults, and return to normal operation once FREE+CACHE goes above
(v_paging_wait) (that is, as soon as possible).
If, due to memory demand, FREE+CACHE falls below (v_paging_min), most
userland processes will block on VM allocations and page faults until
the level returns to above (v_paging_wait).
The hysteresis between (wait) and (start) allows most processes to
continue running normally during nominal paging activities.
* The pageout daemon operates in batches and then loops as necessary.
Pages will be moved from CACHE to FREE as necessary, then from INACTIVE
to CACHE as necessary, then from ACTIVE to INACTIVE as necessary. Care
is taken to avoid completely exhausting any given queue to ensure that
the queue scan is reasonably efficient.
* The ACTIVE to INACTIVE scan has been significantly reorganized and
integrated with the page_stats scan (which updates m->act_count for
pages in the ACTIVE queue). Pages in the ACTIVE queue are no longer
moved within the lists. Instead a persistent roving marker is employed
for each queue.
The m->act_count tests is made against a dynamically adjusted comparison
variable called vm.pageout_stats_actcmp. When no progress is made this
variable is increased, and when sufficient progress is made this variable
is decreased. Thus, under very heavy memory loads, a more permission
m->act_count test allows active pages to be deactivated more quickly.
* The INACTIVE to FREE+CACHE scan remains relatively unchanged. A two-pass
LRU arrangement continues to be employed in order to give the system
time to reclaim a deactivated page before it would otherwise get paged out.
* The vm_pageout_page_stats() scan has been almost completely rewritten.
This scan is responsible for updating m->act_count on pages in the
ACTIVE queue. Example sysctl settings shown below
vm.pageout_stats_rsecs: 300 <--- passive run time (seconds) after pageout
vm.pageout_stats_scan: 472 <--- max number of pages to scan per tick
vm.pageout_stats_ticks: 10 <--- poll rate in ticks
vm.pageout_stats_inamin: 16 <--- inactive ratio governing dynamic
vm.pageout_stats_inalim: 4096 adjustment of actcmnp.
vm.pageout_stats_actcmp: 2 <--- dynamically adjusted by the kernel
The page stats code polls slowly and will update m->act_count and
deactivate pages until it is able to achieve (v_inactive_target) worth
of pages in the inactive queue.
Once this target has been reached, the poll stops deactivating pages, but
will continue to run for (pageout_stats_rsecs) seconds after the pageout
daemon last ran (typically 5 minutes) and continue to passively update
m->act_count duiring this period.
The polling resumes upon any pageout daemon activation and the cycle
* The vm_pageout_page_stats() scan is mostly responsible for selecting
the correct pages to move from ACTIVE to INACTIVE. Choosing the correct
pages allows the system to continue to operate smoothly while concurrent
paging is in progress. The additional 5 minutes of passive operation
allows it to pre-stage m->act_count for pages in the ACTIVE queue to
help grease the wheels for the next pageout daemon activation.
* On a test box with memory limited to 2GB, running chrome. Video runs
smoothly despite constant paging. Active tabs appear to operate smoothly.
Inactive tabs are able to page-in decently fast and resume operation.
* On a workstation with 32GB of memory and a large number of open chrome
tabs, allowed to sit overnight (chrome burns up a lot of memory when tabs
remain open), then video tested the next day. Paging appeared to operate
well and so far there has been no stuttering.
* On a 64GB build box running dsynth 32/32 (intentionally overloaded). The
full bulk starts normally. The packages tend to get larger and larger as
they are built. dsynth and the pageout daemon operate reasonably well in
this situation. I was mostly looking for excessive stalls due to heavy
memory loads and it looks like the new code handles it quite well.
Summary of changes:
sys/kern/vfs_bio.c | 8 +-
sys/kern/vfs_cluster.c | 2 +-
sys/kern/vfs_subr.c | 2 +-
sys/sys/vmmeter.h | 37 ++-
sys/vfs/ext2fs/ext2_vnops.c | 7 +-
sys/vfs/hammer/hammer_blockmap.c | 2 +-
sys/vfs/hammer/hammer_inode.c | 2 +-
sys/vfs/msdosfs/msdosfs_denode.c | 2 +-
sys/vfs/msdosfs/msdosfs_vnops.c | 2 +-
sys/vfs/tmpfs/tmpfs_subr.c | 6 +-
sys/vfs/tmpfs/tmpfs_vnops.c | 8 +-
sys/vfs/ufs/ffs_inode.c | 2 +-
sys/vfs/ufs/ufs_readwrite.c | 2 +-
sys/vm/swap_pager.c | 2 +-
sys/vm/vm_fault.c | 6 +-
sys/vm/vm_glue.c | 2 +-
sys/vm/vm_meter.c | 46 +--
sys/vm/vm_page.c | 115 ++++---
sys/vm/vm_page2.h | 261 ++++++++++++----
sys/vm/vm_pageout.c | 633 +++++++++++++++++++++++++--------------
sys/vm/vm_param.h | 24 +-
21 files changed, 781 insertions(+), 390 deletions(-)
DragonFly BSD source repository
More information about the Commits