Initial filesystem design synopsis.
Matthew Dillon
dillon at apollo.backplane.com
Thu Feb 22 01:24:30 PST 2007
:Quick question...
:
:On Wed, 21 Feb 2007, Matthew Dillon wrote:
:
:> - Infinite snapshots
:>
:> - Multi-master operation
:>
:> - Infinite logless Replication
:
:transid space: monotonic increasing on each replication target, or a
:fine-grained synchronised timestamp*, or something else?
:
:Cheers,
:jan
Monotonic increasing AND a fine-grained timestamp. Low bits of
the timestamp (sub-nanosecond equivalent) would simply be used to
identify the replication target, allowing each target to 'allocate'
transaction ids independantly (and also incidently tell us which
'master' was responsible for the original op that is now being
replicated). A newly created transaction id would at a minimum have
to be larger then the last transaction id... and if this goes beyond
the current 'real time', the host must sleep for a few microseconds
to allow real time to catch up. (In reality the granularity can be
selected such that it is possible to allocate hundreds of thousands
or millions of transids a second across the entire cluster, so this
isn't an issue).
The transaction id must be translatable into a timestamp of sorts
(beyond the monotonic requirement), just to make snapshot handling
sane.
The problem with such a scheme is, of course, that a host which is
not properly time synchronized can throw a big wrench in the works.
And, also, conceviably someone could set the system time
to 0xffffffffffffffff and the filesystem would barf (not be able to
allocate any new transaction ids because, well, it just ran out!).
Sanity checks in the code can handle unsynchronized hosts, guarentee
monotonic increasing transaction ids, and prevent the filesystem
from becoming corrupted. Deliberately generating absurd time stamps
would be a bigger problem... for example, basing your cluster on a
single machine's RTC would be a bad idea. At the very least you
would want an NTP-synchronized time source.
Monotonic increasing transaction ids are *CRITICAL* to replication
protocols. Absolutely critical. It's the difference between having
to keep a physical log of changes (with unbounded size), and just
having to store the last transaction ID you had synchronized to.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list