git: kernel - Major performance changes to VM page management.

Thu Oct 8 14:32:19 PDT 2009

commit 0e8bd897b2ebcf1a575536f3bfdd88fe2377cc27
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date:   Thu Oct 8 14:20:13 2009 -0700

    kernel - Major performance changes to VM page management.
    
        This commit significantly changes the way the kernel caches VM pages.
        Essentially what happens now is that vnodes and VM pages which are
        accessed often wind up in the VM active queue and last on the list
        for recyclement while vnodes and VM pages which are only accessed once
        or twice wind up on the VM inactive queue and are inserted in the middle
        of the list for recyclement.
    
        Previously vnodes were essentially recycled in a LRU fashion and due
        to algorithmic design issues VM pages associated with files scanned
        via open()/read() were also winding up getting recycled in a LRU
        fashion.  This caused relatively often-used data to get recycled way
        too early in the face of large filesystem scans (tar, rdist, cvs, etc).
    
        In the new scheme vnodes and VM pages are essentially split into two
        camps:  Those which are used often and those which are only used once
        or twice.  The ones used often wind up in the VM active queue (and
        their vnodes are last on the list of vnodes which can be recycled),
        and the ones used only once or twice wind up in the VM inactive queue.
        The cycling of a large number of files from single-use scans (tar, rdist,
        cvs, etc on large data sets) now only recycles within the inactive set
        and does not touch the active set AT ALL.  So, for example, files
        often-accessed by a shell or other programs tend to remain cached
        permanently.
    
        Permanance here is a relative term.  Given enough memory pressure
        such files WILL be recycled.  But single-use scans even of huge
        data sets will not create this sort of memory pressure.  Examples
        of how active VM pages and vnodes will get recycled include:
    
        (1) Too many pages or vnodes wind up being marked as active.
        (2) Memory pressure created by anonymous memory from running processes.
    
        Technical Description of changes:
    
        * The buffer cache is limited.  For example, on a 3G system the buffer
          cache only manages around 200MB.  The VM page cache, on the otherhand
          can cover all available memory.
    
          This means that data can cycle in and out of buffer cache at a much
          higher rate then it would from the VM page cache.
    
        * VM pages were losing their activity history (m->act_count) when
          wired to back buffer cache pages.  Because the buffer cache only
          manages around 200MB the VM pages were being cycled in and out
          of the buffer cache on a shorter time period verses how long they
          would be able to survive in the VM page queues.
    
          This caused VM pages to get recycled in more of a LRU fashion instead
          of based on usage, particularly the VM pages for files accessed with
          open()/read().
    
          VM pages now retain their activity history and it also gets updated
          even while the VM pages are owned by the buffer cache.
    
        * Files accessed just once, for example in a large 'tar', 'find', or 'ls',
          could cause vnodes for files accessed numerous times to get kicked out
          of the vnode free list.  This could occur due to an edge case when
          many tiny files are iterated (such as in a cvs update), on machines
          with 2G or more of memory.  In these cases the vnode cache would reach
          its maximum number of vnodes without the VM page cache ever coming
          under pressure, forcing the VM system to throw away vnodes.  The VM
          system invariably chose vnodes with small numbers of cached VM pages
          (which is what we desire), but wound up chosing them in strict LRU
          order regardless of whether the vnode was for a file accessed just
          once or for a file accessed many times.
    
        More technical Description of changes:
    
        * The buffer cache now inherits the highest m->act_count from the VM
          pages backing it, and updates its tracking b_act_count whenever the
          buffer is getblk()'d (and HAMMER does it manually for buffers
          it attaches to internal structures).
    
        * VAGE in the vnode->v_flag field has been changed to VAGE0 and
          VAGE1 (a 2 bit counter).  Vnodes start out marked as being fully
          aged (count of 3) and the count is decremented every time the
          vnode is opened.
    
        * When a vnode is placed in the vnode free list aged vnodes are
          now inserted into the middle of the list while non-aged vnodes
          are inserted at the end.  So aged vnodes get recycled first.
    
        * VM pages returned from the buffer cache are now placed in the
          inactive queue or the active queue based on m->act_count.  This
          works properly now that we do not lose the activity state when
          wiring and unwiring the VM page for buffer cache backings.
    
        * The VM system now sets a much larger inactive page target, 1/4
          of available memory.  This combined with the vnode reclamation
          algorithm which reclaims 1/10 of the active vnodes in the system
          is now responsible for regulating the distribution of 'active'
          pages verses 'inactive' pages.
    
          It is important to note that the inactive page target and the
          vnode reclamation algorithm sets a minimum size for pages and
          vnodes intended to be on the inactive side of the ledger.  Memory
          pressure from having too many active pages or vnodes will cause
          VM pages to move to the inactive side.  But, as already mentioned,
          the simple one-time cycling of files such as in a tar, rdist, or
          other file scan will NOT cause this sort of memory pressure.
    
        Negative aspects of the patch.
    
        * Very large data sets which might have previously fit in memory
          but do not fit in e.g. 1/2 of available memory will no longer
          be fully cached.
    
          This is an either-or type of deal.  We can't prevent active pages
          from getting recycled unless we reduce the amount of data we allow
          to get cached from 'one time' uses before starting to recycle that
          data.
    
    						-Matt

Summary of changes:
 .../linux/i386/linprocfs/linprocfs_subr.c          |    2 +-
 sys/kern/vfs_bio.c                                 |   30 ++++++++++-
 sys/kern/vfs_lock.c                                |   49 +++++++++++++++---
 sys/kern/vfs_mount.c                               |   55 ++++++++++++++++++-
 sys/kern/vfs_vopops.c                              |   13 +++++
 sys/sys/buf.h                                      |    3 +-
 sys/sys/buf2.h                                     |   25 +++++++++
 sys/sys/vnode.h                                    |    8 ++--
 sys/vfs/hammer/hammer.h                            |    1 +
 sys/vfs/hammer/hammer_io.c                         |   11 ++++
 sys/vfs/hammer/hammer_ondisk.c                     |    3 +
 sys/vfs/nfs/nfs_vfsops.c                           |    4 +-
 sys/vm/vm_pageout.c                                |   12 ++++-
 test/debug/vnodeinfo.c                             |   16 +++++-
 usr.sbin/pstat/pstat.c                             |    2 +-
 15 files changed, 208 insertions(+), 26 deletions(-)

http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/0e8bd897b2ebcf1a575536f3bfdd88fe2377cc27


-- 
DragonFly BSD source repository