kernel work week of 3-Feb-2010 HEADS UP

Wed Feb 3 11:36:37 PST 2010

    A few kernel structures will be changing sizes this week so people
    following the development branch should do full kernel compiles
    instead of incremental kernel compiles to be safe.

    This week and next week I will be getting in a few SSDs and
    implementing a cache-data-in-swap feature.

    Over the last decade memory has tended to go up in machines (even
    crummy consumer machines) and as a consequence swap use has gone down.
    Most high-performance web servers and other deployments specifically
    avoid paging to swap due to the resulting loss of performance.  This
    has made swap space less useful on running systems.

    At the same time SSDs (Solid state drives) have become powerful enough
    that they could be used as a data cache for normal hard drives.  In
    many respects, hard drives have gotten so large (2TB+) that the measily
    4-16G of ram systems typically have isn't really enough to efficiently
    cache the active data set.  Even a small SSD, say 40G, would
    provide a huge performance boost to systems serving large data-sets,
    not to mention revitalize older servers and make data serving a lot
    more cost effective by allowing less sophisticated workstation-class
    or even consumer-class hardware to be used.

    --

    What I will be doing this week and possibly into next week is
    implementing a system-wide cache-to-swap feature for objects backed
    by the VM system.  Such objects include file data and filesystem
    meta-data. The type of data/meta-data to cache will be selectable with
    a sysctl and the write-bandwidth to the SSD will also be manageable.

    The feature will allow a person to deploy system configured swap space
    on a SSD and then use that space to cache 'clean' data and meta-data
    which also exists on the normal hard drives filesystems use.  If the
    data is available on the SSD the system will issue reads from the SSD
    instead of the HD when the data/meta-data is not otherwise cached
    in ram.  This feature works differently from normal paging in that
    it will operate even if there is no memory pressure per-say.

    One advantage of this mechanic verses an integrated filesystem solution
    (such as used by ZFS) is that this mechanism is system-wide and will
    work with any filesystem, and the cache is completely throw-away (and
    being swap space it will be thrown away on reboot anyway).  So this
    will be more of a flexible turn-on-and-forget type of feature.  For
    example, you can trivially select how small or large a SSD to use.

    I think this is going to be a very, very cool feature.  It also turns
    out to be fairly low-hanging fruit now.  In recent months the buffer
    cache has been cleaned up and last year the swap system was updated to
    support upwards of 512G of swap (64G on i386 due to KVM limitations).
    Plus pre-cursor cleanups to the swap_pager turned out to be easy to do.

    It will make swap space useful again for those people willing to spend
    a bit on a SSD to improve their server performance.

    I expect it will take about 2 weeks to implement.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>