Plans for 1.8+ (2.0?)

Rupert Pigott darkb00ng at hotmail.com
Sun Feb 18 02:22:56 PST 2007


On Thu, 01 Feb 2007 09:39:30 -0500, Justin C. Sherrill wrote:

> Sort of.  I'm saying that if Matt rolls his own filesystem instead of
> using ZFS, that new filesystem is either:
> 
> 1: not going to have the variety of tools available with zfs for handling
> things like disk pooling/snapshots/data scrubbing/insert zfs term here.

Of course writing these things takes time, but from what I understand
of Matt's approach to this problem I think it will be possible to
leverage existing tools for most of the essential housekeeping operations.
This is a good thing, it means that people don't have to learn new stuff
to use the system. 

> 
> 2: going to have those features, which means Matt's time is going to be
> eaten up reimplementing features already present in other filesystems.
> 

True, but Matt has explained that ZFS doesn't provide the functionality
that DragonFlyBSD needs for cluster computing.

ZFS solves the problem of building a bigger fileserver, but it
doesn't help you distribute that file system across hundreds or thousands
of grid nodes. ZFS doesn't address the issue of high-latency
comms links between nodes, and NFS just curls up and dies when you try to
run it across the Atlantic with 100+ms of latency.

I don't know if IBM's GridFS does any better with the latency, but it
certainly scales a lot better but the barrier for adoption is $$$. It
costs $$$ and it costs a lot more $$$ to train up and hire the SAs to run
it. There are other options like AFS too, but people tend to be put off by
the learning curve and the fact it's an extra rather than something that
is packaged with the OS.

Then there is plan C, the Grid Cache... Quite a few people are buying
stuff like Tangosol and Gemfire to solve the inadequacies of NFS/CIFS and
avoid paying for GridFS, or learning to use something like AFS. Not a
great option in my opinion because these Grid Caches have to provide all
the FS functionality AND they can't be managed with the existing tools
(eg: dd, ls, find, cp, mv, dump, restore etc).

I think that there is a need for a distributed FS suitable for Grids &
Clusters that doesn't require $$$ of retraining to use. Matt's approach to
DragonFlyBSD seems to be aiming to fill that hole, and about time too ! :)

> It's a moot point until Matt can
evaluate modifying existing filesystems
> vs building a new one, though.  I don't want NIH-ism to get in the way
> of having something neat, though

Port ZFS yourself, it still won't solve the problem of distributing
persistent data across several hundred nodes.

I looked at writing a BSD licensed clone for OpenBSD, but I realised that
it just won't help solve the networking problems posed by Grids &
Clusters. I think that a filesystem that is built from the ground up to
work with SYSLINK will though. :)

I think that the real work lies in SYSLINK... How do you deal with node
failure, how to recover etc... What happens if the cluster gets split down
the middle ... Lots of tricky problems there that will probably take up
10x as much of Matt's time as writing a FS. :)

Cheers,
Rupert





More information about the Users mailing list