Improvements in swapcache's ability to cache data using HAMMER double_buffer mode.
dillon at apollo.backplane.com
Mon Apr 4 13:53:16 PDT 2011
Normally data is only cached via the file vnode which means the cache
is blown away when the vnode gets cycled out of the vnode cache. With
kern.maxvnodes around ~100,000 on 32 bit systems and ~400,000 on 64 bit
systems any filesystem which exceeds the limit will cause vnode recycling
to occur. Nearly all filesystems these days exceed these limits,
particularly on 32 bit systems. And on 64-bit systems files are often
not large enough to utilize available memory before hitting the vnode
limit and causing the data to be thrown away despite there being plenty
of free ram.
It is now possible to bypass these limitations in DragonFly master
by enabling both the HAMMER double_buffer feature
(vfs.hammer.double_buffer=1) AND the swapcache data caching
feature (vm.swapcache.data_enable=1). See 'man swapcache' for
additional information on swapcache.
When both features are enabled together swapcache will cache file data
via HAMMER's block device instead of via individual file vnodes, making
the swapcache'd data immune to vnode recyclement. Swapcache is thus
able to cache the data for potentially millions of files up to 75%
of available swap (normally configured up to 32G on 32-bit systems and
up to 512G on 64-bit systems).
Now add the fact that Sata-III is now widely available on motherboards
and Sata-III SSDs are now in mass production. Intel's 510 series,
OCZ's Vertex III, and Crucial's C300 and M4 series are capable of
delivering 300-500 MBytes/sec reading and 200-400 MBytes/sec writing
from a single device. Crucial's C300 series is very cost effective
w/64GB at SATA-III speeds for $160. Compare this to the measily
2-5MBytes/sec a hard drive can do in a random seek/read environment.
We're talking 100x the performance already with just a single SSD
With swapcache this means being able shrink the cost and the size of
what we might consider to be a 'server' by a factor of three or more.
The only downside to the new feature is that data is double-buffered in
ram. That is, file data is cached via the block device AND also via
the file vnode, and there is really no way to get around this other than to
expire one of the copies of the cached data more quickly (which we try
to do). I still consider the feature a bit experimental due to these
inefficiencies. We are definitely on the right track and regardless of
the memory inefficiency the HD accesses go away for real when swapcache
SSD can take the load instead.
On one of our older servers I can now grep through 950,000 files
(~15GB worth of file data) at ~2000-4000 files per second pulling
40-50 MBytes/sec from the SSD and *zero* activity on the HD. That is
a big deal that only a big whopping RAID system or a ton of ram could
compete with prior to the advent of SSDs... all from a little $700 box
with an older $100 SSD in it.
<dillon at backplane.com>
More information about the Users