VFS ROADMAP (and vfs01.patch stage 1 available for testing)
dillon at apollo.backplane.com
Thu Aug 12 18:24:08 PDT 2004
This represents the first of the many stages that will be needed to
thread the VFS subsystem. I am going to have commit the work
incrementally, stage by stage, to keep the number of bugs to a minimum.
This patch rips out the nearly unreadable and hugely inlined VOP
operations vector code and replaces it with a fixed structure and
*theoretically* this patch does not make any major operational changes,
despite replacing nearly the entire VOP operations vector support code
Some testing would be appreciated. This patch will go in tomorrow.
ROADMAP FOR THREADED VFS's
Then I'll start working on stage 2 which will be to wrap all the
VOP forwarding calls (the VCALL and VOCALL macros).
That will give us the infrastructure necessary to implement a
messaging interface in a later stage (probably around stage 15 :-)).
The really nasty stuff starts at stage 3. Before I can implement the
messaging interface I have to:
* Lock namespaces via the namecache rather then via directory
vnode locking (ultimately means that directories do not have
to be exclusively locked during create/delete/rename). Otherwise
even the simplest, fully cached namespace operations will wind up
needing a message.
This step alone will require major changes to the arguments passed
in just about every single VOP call because we will be switching
from passing directory vnodes to passing namecache pointers.
* Ranged data locks will replace the vnode lock for I/O atomicy
guarentees (ultimately means that if program #1 is
blocked writing to a vnode program #2 can still read cached
data from that same vnode without blocking on program #1).
Otherwise the messaging latency will kill I/O performance.
* vattr information will be cached in the vnode so it can be
accessed directly without having to enter the VFS layer.
* VM objects will become mandatory for all filesystems and will
also be made directly accessible to the kernel without having
to enter the VFS layer (ultimately this will result in greatly
improved read() and write() performance).
* Implement a bottom-up cache invalidation abstraction for
namespace, attribute, and file data, so layered filesystems
By the time that's done I'll be at stage 40 or so :-). THEN I will be
* Give UIO's the capability to copy data across threads (so a VFS
thread can access the user data supplied to it in a UIO). This
may be implemented by switching the UIO's to XIO's, breaking down
large read() and write() requests, and relying on the range
locks to preserve atomicy.
* Thread the VFS layer and convert the wrappers to a messaging
interface. By this time just about the only real work the VFS
layer will have to do will (hopefully) either be asynchronous or
require an I/O anyway.
* Implement a user process based VFS API that actually works.
* Rip out the remaining vnode locking code from the point of view of
the kernel. The vnode locks become 'local' to the VFS, and then
only if the VFS is multi-threaded.
And, finally, once all of that is done, around stage 75, we may even be
able to rip out the per-vnode locks that UFS uses and replace them with
fine-grained data block range locks, which will allow massive parallelism
even operating on a single file.
This is a pretty ambitious plan, it could take me into next year to
finish it all but when it is done we will be extremely well positioned
for the ultimate goal of implementing fully transparent clustering.
More information about the Kernel