cvs commit: src/sys/vfs/hammer hammer.h hammer_btree.c hammer_cursor.h hammer_disk.h hammer_inode.c hammer_ioctl.h hammer_mirror.c hammer_object.c hammer_subs.c hammer_vnops.c

Thu Jul 10 03:22:56 PDT 2008

Matthew Dillon wrote:
dillon      2008/07/09 21:44:33 PDT

DragonFly src repository

  Modified files:
    sys/vfs/hammer       hammer.h hammer_btree.c hammer_cursor.h 
                         hammer_disk.h hammer_inode.c 
                         hammer_ioctl.h hammer_mirror.c 
                         hammer_object.c hammer_subs.c 
                         hammer_vnops.c 
  Log:
  HAMMER 60J/Many: Mirroring

  Finish implementing the core mirroring algorithm.  The last bit was to add
  support for no-history deletions on the master.  The same support also covers
  masters which have pruned records away prior to the mirroring operation.
  As with the work done previously, the algorithm is 100% queue-less and
  has no age limitations.  You could wait a month, and then do a mirroring
  update from master to slave, and the algorithm will efficiently handle it.

  The basic issue that this commit tackles is what to do when records are
  physically deleted from the master.  When this occurs the mirror master
  cannot provide a list of records to delete to its slaves.

  The solution is to use the mirror TID propagation to physically identify
  swaths of the B-Tree in which a deletion MAY have taken place.  The
  mirroring code uses this information to generate PASS and SKIP mrecords.

  A PASS identifies a record (sans its data payload) that remains within
  the identified swath and should already exist on the target.  The
  mirroring target does a simultanious iteration of the same swath on the
  target B-Tree and deletes records not identified by the master.

  A SKIP is the heart of the algorithm's efficiency.  The same mirror TID
  stored in the B-Tree can also identify large swaths of the B-Tree for which
  *NO* deletions have taken place (which will be most of the B-Tree).  One
  SKIP Record can identify an arbitrarily large swath.  The target uses
  the SKIP record to skip that swath on the target.  No scan takes place.
  SKIP records can be generated from any internal node of the B-Tree and cover
  that node's entire sub-tree.

  This also provides us with the feature where the retention policy can be
  completely different between a master and a mirror, or between mirrors.
  When the slave identifies a record that must be deleted through the above
  algorithm it only needs to mark it as historically deleted, it does not
  have to physically delete the record.
I do not understand your last sentence. Does the deletion happen inside
the mirroring? When reading your sentence for the first time I thought
that I could mount the slave read/write and delete a record. But, I
think what you mean is the following situation:
A slave has a record which is then deleted on the master. When mirroring
occurs the slave historically-deletes that record to stay in sync with
the master. It does not physically delete it. This means that the time
when the record is physically deleted depends on the retention policy,
which might differ from master to slave.
So this means that we could maintain weekly snapshots on the slave,
while the master would retain dayly snapshots up to the last two weeks.
But care must be taken to not prune before mirror. When keeping two
weeks on the master this should be no problem. Wow! Nice feature!
Regards,

  Michael