git: kernel - Implement RLIMIT_RSS, Increase maximum supported swap
dillon at crater.dragonflybsd.org
Tue Dec 27 23:08:16 PST 2016
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Tue Dec 27 18:34:26 2016 -0800
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the pages
is fairly unsophisticated (we don't have the luxury of a per-process
* Implement the swap_user_async sysctl, default off. This sysctl can be
set to 1 to enable asynchronous paging in the RSS code. This is mostly
for testing and is not recommended since it allows the process to eat
memory more quickly than it can be paged out.
* Reimplement vm.swap_burst_read so the sysctl now specifies the number
of pages that are allowed to be burst. Still disabled by default (will
be enabled in a followup commit).
* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.
* Refactor some of the pageout code to support synchronous direct
paging, which the RSS code uses. Thew new code also implements a
feature that will move clean pages to PQ_CACHE, making them immediately
* Refactor the vm_pageout_deficit variable, using atomic ops.
* Fix an issue in vm_pageout_clean() (originally part of the inactive scan)
which prevented clustering from operating properly on write.
* Refactor kern/subr_blist.c and all associated code that uses to increase
swblk_t from int32_t to int64_t, and to increase the radix supported from
31 bits to 63 bits.
This increases the maximum supported swap from 2TB to some ungodly large
value. Remember that, by default, space for up to 4 swap devices
is preallocated so if you are allocating insane amounts of swap it is
best to do it with four equal-sized partitions instead of one so kernel
memory is efficiently allocated.
* There are two kernel data structures associated with swap. The blmeta
structure which has approximately a 1:8192 ratio (ram:swap) and is
pre-allocated up-front, and the swmeta structure whos KVA is reserved
but not allocated.
The swmeta structure has a 1:341 ratio. It tracks swap assignments for
pages in vm_object's. The kernel limits the number of structures to
approximately half of physical memory, meaning that if you have a machine
with 16GB of ram the maximum amount of swapped-out data you can support
with that is 16/2*341 = 2.7TB. Not that you would actually want to eat
half your ram to do actually do that.
A large system with, say, 128GB of ram, would be able to support
128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM.
The swap system can use up to 256GB of this so the maximum swap currently
supported by DragonFly on a machine with > 512GB of ram is going to be
256/2*341 = 43TB. To expand this further would require some adjustments
to increase the amount of KVM supported by the kernel.
* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory
can be reused for swmeta but cannot be freed for use by other subsystems.
You should only configure as much swap as you are willing to reserve ram
Summary of changes:
lib/libkvm/kvm_getswapinfo.c | 52 +-
sys/cpu/x86_64/include/param.h | 8 +-
sys/kern/subr_blist.c | 98 +--
sys/kern/subr_param.c | 6 +-
sys/platform/pc64/include/pmap.h | 4 +-
sys/platform/pc64/x86_64/trap.c | 2 +-
sys/platform/vkernel64/include/pmap.h | 4 +-
sys/sys/blist.h | 30 +-
sys/vm/swap_pager.c | 59 +-
sys/vm/swap_pager.h | 9 +-
sys/vm/vm_fault.c | 43 +-
sys/vm/vm_map.c | 10 +-
sys/vm/vm_map.h | 6 +-
sys/vm/vm_object.c | 4 +-
sys/vm/vm_object.h | 4 +-
sys/vm/vm_page.c | 4 +-
sys/vm/vm_pageout.c | 1133 +++++++++++++++++++--------------
sys/vm/vm_pageout.h | 3 +-
sys/vm/vm_pager.h | 2 +
sys/vm/vm_swap.c | 17 +-
20 files changed, 894 insertions(+), 604 deletions(-)
DragonFly BSD source repository
More information about the Commits