git: hammer2 - Rewrite internal chain algorithms
dillon at crater.dragonflybsd.org
Fri Sep 27 08:16:24 PDT 2013
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Fri Sep 27 07:50:53 2013 -0700
hammer2 - Rewrite internal chain algorithms
* These are big changes designed to make it easier for the flusher to
order media writes and as a requirement for upcoming 'copies' and other
work. H2 will be somewhat unstable for a short while.
* Replace the single-layer RBTREE per topological level with a stack of
RBTREEs per topological level. This allows conflicts to be stacked
vertically roughly by flush synchronization group instead of trying
to arrange everything in a single 2D RBTREE. Conflicts arise because
the in-memory tree can represent multiple synchronization points with
live and deleted-flagged chains.
The RBTREE compare function is actually simplified by this change, but
the code must contend with a list of RBTREEs (built as conflicts arise)
instead of just one.
* The block table for any given topological layer is now ordered, and this
will now be reflected and required on-media as well. That is, it is
sorted. It is still effectively set-associative in that the dynamic
radix tree model is still being used, but elements must now be sorted.
This improves I/O but more importantly it will allow H2 to use cursors
to track multiple mounts used in 'copies' and later on in clustering.
Running multiple media topologies concurrently was impossible when the
block sets were unordered.
This makes implementing a merged search (blocktable and in-memory chains)
* Remove chain->index tracking. We used to track the block table index
that each in-memory chain intended to use which worked 'ok' when the
block table (such as in an inode or an indirect block) was unordered,
but requires excessive (and unnecessary) coding when the block table
Instead of tracking indices we track the total number of 'live' (not
deleted) elements that will end up in the block table in hammer2_chain_core
and use that to determine when indirect blocks need to be created or not.
Block table entries are now only assigned by the flusher itself when it
determines that an entry must be added or deleted.
* The flusher now operates more deterministically by scanning the stacked
RBTREEs backwards (earliest to most recent). The stacks are not specifically
broken down by flush group (instead stacking only occurs if there is a
key range conflict), so the flusher still checks the modify_tid and
delete_tid for each chain it scans, but there should no longer be any
chance of media updates being misordered.
Because block table indices are now assigned by the flusher, each RBTREE
stack is scanned three times instead of twice. One for the recursion down
and two for the finalization on the way back up which updates the block
table. The block table scan was broken into two passes to handle deletions
in the first pass and insertions/updates in the second pass which allows
the block table to be updated on the fly without having to assign specific
array indices to chains.
* Replace the first_parent/next_parent linkages used to deal with multi-homed
chains (chains with multiple parents which can arise due to renames,
moves into or out of indirect blocks ,etc). Use a queue instead.
Remove the additional ref count used in the old linkage model. We must
still ensure that the final 'live' chain does not get removed when its
refs drops to 0.
Summary of changes:
sys/vfs/hammer2/hammer2.h | 125 ++-
sys/vfs/hammer2/hammer2_chain.c | 1943 +++++++++++++++++++++----------------
sys/vfs/hammer2/hammer2_flush.c | 186 ++--
sys/vfs/hammer2/hammer2_freemap.c | 10 +-
sys/vfs/hammer2/hammer2_inode.c | 69 +-
sys/vfs/hammer2/hammer2_ioctl.c | 39 +-
sys/vfs/hammer2/hammer2_vfsops.c | 61 +-
sys/vfs/hammer2/hammer2_vnops.c | 52 +-
8 files changed, 1455 insertions(+), 1030 deletions(-)
DragonFly BSD source repository
More information about the Commits