VFS ROADMAP (and vfs01.patch stage 1 available for testing)

Fri Aug 13 09:49:59 PDT 2004

:Matthew Dillon wrote:
:>    And, finally, once all of that is done, around stage 75, we may even be
:>    able to rip out the per-vnode locks that UFS uses and replace them with
:>    fine-grained data block range locks, which will allow massive parallelism
:>    even operating on a single file.
:>
:>    This is a pretty ambitious plan, it could take me into next year to
:>    finish it all but when it is done we will be extremely well positioned
:>    for the ultimate goal of implementing fully transparent clustering.
:
:Sorry if these questions are naive.  I've been meaning to ask: what
:are the goals of this "transparent clustering" idea?
:
:I've been reading a bit about the linux OpenSSI project.  There,
:apparently, you have a shared filesystem and a shared process table,
:so you can access processes on other computers, migrate jobs from a
:heavily-loaded node to a less-loaded node, and so on.  Is that the
:idea for DragonFly too?

    Well, shared filesystems certainly.  And, yes, a shared proces 
    table as well.  However, in the case of the process table it isn't
    really 'shared' so much as the data is replicated, with the master
    being whatever machine is currently actually running the process.
    There is a distinct difference between shared data, which implies
    a peer-peer relationship for the data, and replicated data, which
    implies a master-slave relationship for the data.

:What about threads?  Will a multi-threaded program on a future
:DragonFly cluster run as if it were on a multi-CPU SMP machine, or
:will it stay on one node (which, as far as I can make out, is the case
:with Linux OpenSSI)? 

    It will be able to run across multiple machines.  In order for this
    to work the data abstraction needs a fully integrated cache management
    subsystem that is machine-aware.  I see no point doing SSI if a 
    threaded program cannot be split across multiple nodes (whether or not
    it is a good idea to do so would depend on what the program does,
    of course).

    This is also one of the reasons why the kernel layer has to do the
    primary lock management for things like I/O atomicy... because eventually
    it will have to integrate with other nodes in the cluster that might also
    be performing I/O on the same 'file'.

:I ask because I do a bit of scientific programming.  I haven't done
:any parallel/clustered programming so far, but may want to in the
:future.  The "standard" way to do it is to build a (usually linux)
:cluster and use MPI or similar special-purpose libraries for
:message-passing.  I'm wondering whether in the long-term picture for
:DragonFly, this will be somehow simplified/improved, or does this have
:nothing to do with DragonFly's goals...
:
:Rahul

    I really hate (the concept of) MPI.  I feel that the only way to do
    clustering properly is to make it be always there, transparent and
    ready to go the moment you fork() or clone().

    What I want is for clustering to be an always-on type of feature, where
    any program that is written will naturally use it but, also, where 
    programs can give the kernel 'hints' about the best type of topology
    the kernel should use.

    So, for example, take cryptography.  In particular consider the prime
    number factoring problem that is used to break public keys.  That winds
    up being a huge parallelizable sparse matrix operation (or most of
    it anyway).  If someone were to write a standard threaded program but
    took care to partition the memory such that threads tended to stick to
    their own areas of the matrix 'most of the time'... that is something
    I want DragonFly to be able to cluster naturally, with full data
    coherency and transparency.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>