VFS journaling... similar technology

Matthew Dillon dillon at apollo.backplane.com
Fri Jul 15 10:59:56 PDT 2005


:I was reading the following thread.
:http://leaf.dragonflybsd.org/mailarchive/kernel/2005-07/msg00024.html
:
:I think it makes sense to point out that there are other systems that
:are linux/BSD like that have VFS messaging.  Andrew Morton has adopted
:v9fs which is an implementation of the 9P filesystem protocol, done
:through messages, and hooks into the linux VFS.
:
:In fact, it should be possible to now use the Plan9Ports "venti"
:system which does something similar to the idea of a realtime-backup
:system. Venti is traditionally used as the block storage backend for a
:"Fossil" filesystem on Plan 9 systems.  It effectively implements
:something like a "write once" repository [yes, you can dump old
:blocks] and for some people this eliminates the need to for CVS as
:people can pull old files from the log per-se from previous snapshots
:that are taken.
:
:It's pretty fascinating stuff.  Even if DragonFly never does 9P all of
:these things could be done through this messaging interface that you
:now have.  It'd be really cool for us Inferno/Plan9 geeks to have a
:translation layer to 9P and back on DragonFly.  We could instantly tie
:DragonFly into our grids.  And 9P is pretty danged reliable... I can
:mount my files on a japanese server from Seattle and the connection
:never seems to break [of course they have stability algorithms for
:that].
:
:Lookin good guys!  It might be helpful for ideas to poke around in
:these more esoteric OSes... some of this kind of work has, in fact,
:been done before [just not exactly the same way].
:
:Dave

    I am not familiar with 9P, but Hiten and I were just talking 
    yesterday about implementing userland VFS.

    It turns out that we are a lot closer to being able to do it then I
    thought we were.  The journaling code's FIFO infrastructure is already
    fully capable of a generic two-way transaction-based stream between
    userland and the kernel, and already solves the issue of large I/O's
    (i.e. someone does a read() or write() of a gigabyte in a single call).
    It is also capable of handling stream restarts (i.e. you kill and restart
    the userland process), though there is one synchronization issue there
    related to large transactions that I haven't solved yet.

    Implementing a userland VFS based on a two-way stream is thus a very
    easily reachable goal.  We basically just create a VFS layer that uses
    the same journaling FIFO mechanism that the journaling code currently
    uses and then instead of encapsulating only the modifying ops in the
    stream, we would encapsulate ALL the ops and process the return stream
    to get the results.

    Insofar as robustness goes, I think that is a reachable goal as well
    once I solve this last little issue with restarting large transactions
    (the basic problem is that a large transaction, e.g. a 1GB read or write,
    is far larger then the memory FIFO the kernel uses to buffer the stream,
    so the userland process must acknowledge portions of the transaction
    before actually completing the transaction, which means it must store
    the data somewhere and fsync it so it can transparently reconnect to
    the journaling stream if it is killed and restarted).

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list