Plans for 1.8+ (2.0?)
Chris Csanady
cc at 137.org
Thu Feb 1 09:41:57 PST 2007
2007/1/31, Matthew Dillon <dillon at apollo.backplane.com>:
I am seriously considering our options with regards to ZFS or a
ZFS-like filesystem. We clearly need something to replace UFS,
but I am a bit worried that porting ZFS would be as much work
as simply designing a new filesystem from scratch.
It is worth noting that Sun is looking at extending ZFS to be a
cluster aware filesystem. If you dig through their mailing list
archives, you will see that it is a topic that pops up every now and
then.
In any case, I feel that it would be best to port ZFS, even if you
intend to create a new filesystem. It is a great local filesystem,
and it will offer compatibility with Solaris, MacOS, and FreeBSD. (and
probably Linux once it is relicensed.) It seems a waste not to take
advantage of Sun's efforts, especially since the code is so
portable--in fact, almost all of the OS dependent bits are in a single
file.
Pawel Jakub Dawidek made very rapid progress on the FreeBSD port.
Considering that DragonFly now has a virtual kernel and much simpler
VFS, the project should be vastly easier. If you were to work on it,
I wouldn't be surprised if you could finish the core of the work in a
weekend. Probably the most time-consuming part will be interfacing
with the device layer; things like supporting EFI labels,
automatically discovering disks, and so forth.
They even have a porting guide if you are interested:
http://www.opensolaris.org/os/community/zfs/porting
One big advantage of a from-scratch design is that I would be
able to address the requirements of a clustered operating system
in addition the requirements of multi-terrabyte storage media.
Even with a from-scratch design, ZFS is well worth careful
examination. There are many things it does very well, and
re-implementing even a fraction of its features would be very time
consuming. In the mean time, it would be good to have ZFS.
The one part of it that I think could be handled better is the
inflexibility of the redundancy. It would be nice to specify
redundancy per-dataset, and not be tied to the underlying static vdev
redundancy. RAIDZ is also a bit inflexible itself; it would be great
to throw arbitrarily sized disks into a pool and not have to worry
about the layout at all. To distribute blocks and recovery blocks
(much like with par2) across machines. Full 3-way mirroring is quite
expensive, but would be necessary over a WAN. The current limitations
though seem to be the result of a compromise, considering that this is
a very difficult problem.
Finally, I think that the network filesystem is the single largest
gaping hole in modern operating systems. All of the commonly
available systems are absolutely awful, and I have been anticipating
DragonFly's CCMS. It seems that with this and the VFS messaging work,
it should be almost trivial to create a fast and solid remote
filesystem. That said, the very paradigm of the network filesystem
should probably be tossed in favor of the clusterable filesystem which
I imagine you have in mind.
Chris
More information about the Users
mailing list