Improvements in swapcache's ability to cache data using HAMMER double_buffer mode.

Mon Apr 4 13:53:16 PDT 2011

    Normally data is only cached via the file vnode which means the cache
    is blown away when the vnode gets cycled out of the vnode cache.  With
    kern.maxvnodes around ~100,000 on 32 bit systems and ~400,000 on 64 bit
    systems any filesystem which exceeds the limit will cause vnode recycling
    to occur.  Nearly all filesystems these days exceed these limits,
    particularly on 32 bit systems.  And on 64-bit systems files are often
    not large enough to utilize available memory before hitting the vnode
    limit and causing the data to be thrown away despite there being plenty
    of free ram.

    It is now possible to bypass these limitations in DragonFly master
    by enabling both the HAMMER double_buffer feature
    (vfs.hammer.double_buffer=1) AND the swapcache data caching
    feature (vm.swapcache.data_enable=1).  See 'man swapcache' for
    additional information on swapcache.

    When both features are enabled together swapcache will cache file data
    via HAMMER's block device instead of via individual file vnodes, making
    the swapcache'd data immune to vnode recyclement.  Swapcache is thus
    able to cache the data for potentially millions of files up to 75%
    of available swap (normally configured up to 32G on 32-bit systems and
    up to 512G on 64-bit systems).

    --

    Now add the fact that Sata-III is now widely available on motherboards
    and Sata-III SSDs are now in mass production.  Intel's 510 series,
    OCZ's Vertex III, and Crucial's C300 and M4 series are capable of
    delivering 300-500 MBytes/sec reading and 200-400 MBytes/sec writing
    from a single device.  Crucial's C300 series is very cost effective
    w/64GB at SATA-III speeds for $160.  Compare this to the measily
    2-5MBytes/sec a hard drive can do in a random seek/read environment.
    We're talking 100x the performance already with just a single SSD
    swap device.

    With swapcache this means being able shrink the cost and the size of
    what we might consider to be a 'server' by a factor of three or more.

    --

    The only downside to the new feature is that data is double-buffered in
    ram.  That is, file data is cached via the block device AND also via
    the file vnode, and there is really no way to get around this other than to
    expire one of the copies of the cached data more quickly (which we try
    to do).  I still consider the feature a bit experimental due to these
    inefficiencies.  We are definitely on the right track and regardless of
    the memory inefficiency the HD accesses go away for real when swapcache
    SSD can take the load instead.

    On one of our older servers I can now grep through 950,000 files
    (~15GB worth of file data) at ~2000-4000 files per second pulling
    40-50 MBytes/sec from the SSD and *zero* activity on the HD.  That is
    a big deal that only a big whopping RAID system or a ton of ram could
    compete with prior to the advent of SSDs... all from a little $700 box
    with an older $100 SSD in it.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>