Plans for 1.8+ (2.0?)

Chris Csanady cc at 137.org
Thu Feb 1 09:41:57 PST 2007


2007/1/31, Matthew Dillon <dillon at apollo.backplane.com>:
    I am seriously considering our options with regards to ZFS or a
    ZFS-like filesystem.  We clearly need something to replace UFS,
    but I am a bit worried that porting ZFS would be as much work
    as simply designing a new filesystem from scratch.
It is worth noting that Sun is looking at extending ZFS to be a
cluster aware filesystem.  If you dig through their mailing list
archives, you will see that it is a topic that pops up every now and
then.
In any case, I feel that it would be best to port ZFS, even if you
intend to create a new filesystem.  It is a great local filesystem,
and it will offer compatibility with Solaris, MacOS, and FreeBSD. (and
probably Linux once it is relicensed.)  It seems a waste not to take
advantage of Sun's efforts, especially since the code is so
portable--in fact, almost all of the OS dependent bits are in a single
file.
Pawel Jakub Dawidek made very rapid progress on the FreeBSD port.
Considering that DragonFly now has a virtual kernel and much simpler
VFS, the project should be vastly easier.  If you were to work on it,
I wouldn't be surprised if you could finish the core of the work in a
weekend.  Probably the most time-consuming part will be interfacing
with the device layer; things like supporting EFI labels,
automatically discovering disks, and so forth.
They even have a porting guide if you are interested:

	 http://www.opensolaris.org/os/community/zfs/porting

    One big advantage of a from-scratch design is that I would be
    able to address the requirements of a clustered operating system
    in addition the requirements of multi-terrabyte storage media.
Even with a from-scratch design, ZFS is well worth careful
examination.  There are many things it does very well, and
re-implementing even a fraction of its features would be very time
consuming.  In the mean time, it would be good to have ZFS.
The one part of it that I think could be handled better is the
inflexibility of the redundancy.  It would be nice to specify
redundancy per-dataset, and not be tied to the underlying static vdev
redundancy.  RAIDZ is also a bit inflexible itself; it would be great
to throw arbitrarily sized disks into a pool and not have to worry
about the layout at all.  To distribute blocks and recovery blocks
(much like with par2) across machines.  Full 3-way mirroring is quite
expensive, but would be necessary over a WAN.  The current limitations
though seem to be the result of a compromise, considering that this is
a very difficult problem.
Finally, I think that the network filesystem is the single largest
gaping hole in modern operating systems.  All of the commonly
available systems are absolutely awful, and I have been anticipating
DragonFly's CCMS.  It seems that with this and the VFS messaging work,
it should be almost trivial to create a fast and solid remote
filesystem.  That said, the very paradigm of the network filesystem
should probably be tossed in favor of the clusterable filesystem which
I imagine you have in mind.
Chris





More information about the Users mailing list