HAMMER update 06-Feb-2008
Matthew Dillon
dillon at apollo.backplane.com
Wed Feb 6 17:19:04 PST 2008
:How will this affect parallel IO (reads, but especially writes)? Would=20
:having such a global structure serialize it? (I'm assuming, possibly=20
:wrongly, that having trees per-cluster allowed you to lock individual=20
:clusters).
Reads will not be effected at all... the locking occurs at the B-Tree
node layer.
Writes will not be serialized and will still be asynchronous so the
most typical striping setups on multi-disk filesystems should still
yield very high performance. Writes WILL be far more likely to be
sequential which should actually improve write performance. Also
keep in mind that writes are buffered by the buffer cache, so there
is a caching layer between userland and the physical disk.
Mixed data writes (parallel write operations by multiple processes in
different parts of the filesystem) will generally lay down new
information sequentially on disk, which can be detrimental for read
performance since the individual files will not be entirely sequential.
I seem to recall a paper at a USENIX long ago where someone tested
locality of reference for reads after laying down writes from
parallel sources sequentially, and it was no worse then trying to zone
the disparate writes, so I'm not really worried about this case.
Also, once you get over a track or two's worth of data, it costs about
the same to seek 3 tracks as it does to seek 10 tracks, so as long as
writes are not *completely* strewn about due to lots of parallel write
activity occuring, it shouldn't be a problem. They won't be because
writes are cached in the buffer cache prior to being flushed out. We
should get nice long bursts of sequentially ordered data on disk.
--
I don't like to think that I wasted a ton of time building the
cluster mechanism, and its kinda sad to see so much code removed. But
most of the work over the last few months has been B-Tree centric,
implementing the inode cache, high level VOPs, record structures, etc...
and those parts of the codebase remain intact.
It really got to the point where implementing the last bits was starting
to take way way too much time. When things start to take that much time
to do, I know I've made a mistake somewhere in the design. Better to
fix it now then to try to slog through the complexity later on.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list