SMP (Was: Why did you choose DragonFly?)
Matthew Dillon
dillon at apollo.backplane.com
Tue Sep 21 10:50:42 PDT 2010
:That explains the noticable performance difference just logging in... I
:always just thought avalon was getting used for something else I didn't
:know about...
Yah, the bulk build runs Avalon out of memory faster than it can swap
pages out because the bulk build is also loading the disk heavily
with reads. The pageout daemon just can't retire the data quickly
enough. That causes the VM system to stall on low real memory
for a few seconds every so often while the bulk build is running.
It shows the very real limitations of a single disk drive when no
swapcache/SSD is present.
Pkgbox64 does exactly the same bulk build as Avalon, with exactly
the same single-drive setup for the filesystem, but also has a 40G
SSD stuffed with swapcache enabled. Ok, Pkgbox64 also has 4G of ram
instead of 3G (since it's running 64-bit), but that isn't why it
performs better. It performs better because the swapcache offloads
100% of the main disk's meta-data and the swap-based TMPFS is
entirely in the SSD.
Single-drive limitations are still present, but pushed way out on the
performance curve. The key fact here is that the SSD doesn't need
to be very large. It's barely 40G (verses the 750G main disk) and yet
has a huge positive effect on the machine's performance.
:Is there benchmarks around for swapcache? i.e. same hardware, same
:software task, with and without swapcache?
:
Hmm. It's a bit hard to benchmark a machine under that sort of load
but I could do some blogbench tests. It comes down to the filesystem
meta-data essentially only having to be read from disk once and
from then on until machine reboot being available in either the
system ram or the swapcache SSD regardless of what else is going on
in the system. On the practical side the swapcache does not allowed
continuous high bandwith writing to the SSD since the SSD has to last
a reasonable period of time (10 years) before wearing out. That
works well in real life but benchmarks compress the time scale so
for the benchmark to be accurate the write bandwidth limitations have
to be turned off.
For example, on leaf, once the meta-data is read once after boot,
things like 'find' on local disk run very fast until the next reboot.
A find using meta-data cached in memory can run around 82000+ files
per second at the moment. With a data set large enough that main
memory cannot hold the meta-data a find running through using
swapcached data on the SSD can run around 53000 fps.
leaf:/root# /usr/bin/time find /build | wc -l
32.81 real 2.57 user 28.69 sys
2701733 <---- purely from ram 82344 fps
leaf:/root# /usr/bin/time find /build /home | wc -l
89.13 real 4.49 user 64.22 sys
4775413 <---- doesn't fit in ram, SSD used 53578 fps
(fresh reboot, swapcache disabled)
leaf:/root# /usr/bin/time find /build /home | wc -l
916.00 real 5.92 user 62.48 sys
4775170 5213 fps
(repeat, swapcache disabled.. depend on ram)
leaf:/root# /usr/bin/time find /build /home | wc -l
449.39 real 5.06 user 59.16 sys
4775175 10625 fps
(repeat, third run, swapcache disabled)
leaf:/root# /usr/bin/time find /build /home | wc -l
402.09 real 5.30 user 60.78 sys
4775177 11875 dps
So the grand result in the case of leaf is that a nominally running
system with swapcache can do directory operations 5 times faster.
Short-term caching of smaller directory subsets will of course run
at full ram bandwidth, but on a machine like leaf there are always
a few things going on and it often just takes leaving your xterm for
a few minutes before your cached meta-data starts getting thrown
away. Someone working on the machine doing git pulls or source
tree searches regularly will also be regularly annoyed without
swapcache.
As you can see there is a huge difference in nominal name lookup
performance with a swapcache/SSD present when the filesystem(s)
are large enough such that normal ram caching is unable to hold
the data set.
Even in smaller systems where the filesystems are not so large
both normal and overnight activities (using firefox, overnight
locate.db, etc) tend to blow away what meta-data might have been
cached previously, not to mention cause active but idle programs
to get paged out. Even a smaller system such as a workstation can
seriously benefit from a swapcache/SSD setup.
Similarly when one is talking about a server running web services,
rsync services, mail, etc... those services tend to have large
meta-data footprints. rsync will scan the entire directory tree even for
incremental syncs. git clients and cvs servers and clients are also
heavy meta-data users. Someone running a large mail server can wind
up with a huge backlog of mail queue files. Swapcache greatly improves
the sustainable performance of those services.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Users
mailing list