VFS ROADMAP (and vfs01.patch stage 1 available for testing)

Thu Aug 12 18:24:08 PDT 2004

    This represents the first of the many stages that will be needed to
    thread the VFS subsystem.  I am going to have commit the work
    incrementally, stage by stage, to keep the number of bugs to a minimum.

	fetch http://leaf.dragonflybsd.org/~dillon/vfs01.patch

    This patch rips out the nearly unreadable and hugely inlined VOP
    operations vector code and replaces it with a fixed structure and
    wrapper procedures.

    *theoretically* this patch does not make any major operational changes,
    despite replacing nearly the entire VOP operations vector support code
    in vfs_init.c.

    Some testing would be appreciated.  This patch will go in tomorrow.

    --
			    ROADMAP FOR THREADED VFS's

    Then I'll start working on stage 2 which will be to wrap all the 
    VOP forwarding calls (the VCALL and VOCALL macros).

    That will give us the infrastructure necessary to implement a 
    messaging interface in a later stage (probably around stage 15 :-)).

    The really nasty stuff starts at stage 3.  Before I can implement the
    messaging interface I have to:

	* Lock namespaces via the namecache rather then via directory 
	  vnode locking (ultimately means that directories do not have
	  to be exclusively locked during create/delete/rename).  Otherwise
	  even the simplest, fully cached namespace operations will wind up
	  needing a message.

	  This step alone will require major changes to the arguments passed
	  in just about every single VOP call because we will be switching
	  from passing directory vnodes to passing namecache pointers.

	* Ranged data locks will replace the vnode lock for I/O atomicy 
	  guarentees (ultimately means that if program #1 is
	  blocked writing to a vnode program #2 can still read cached
	  data from that same vnode without blocking on program #1).
	  Otherwise the messaging latency will kill I/O performance.

	* vattr information will be cached in the vnode so it can be
	  accessed directly without having to enter the VFS layer.

	* VM objects will become mandatory for all filesystems and will
	  also be made directly accessible to the kernel without having
	  to enter the VFS layer (ultimately this will result in greatly
	  improved read() and write() performance).

	* Implement a bottom-up cache invalidation abstraction for 
	  namespace, attribute, and file data, so layered filesystems
	  work properly.

    By the time that's done I'll be at stage 40 or so :-).  THEN I will be
    able to:

	* Give UIO's the capability to copy data across threads (so a VFS
	  thread can access the user data supplied to it in a UIO).  This
	  may be implemented by switching the UIO's to XIO's, breaking down
	  large read() and write() requests, and relying on the range 
	  locks to preserve atomicy.

	* Thread the VFS layer and convert the wrappers to a messaging
	  interface.  By this time just about the only real work the VFS
	  layer will have to do will (hopefully) either be asynchronous or
	  require an I/O anyway.

	* Implement a user process based VFS API that actually works.

	* Rip out the remaining vnode locking code from the point of view of
	  the kernel.  The vnode locks become 'local' to the VFS, and then
	  only if the VFS is multi-threaded.  

    And, finally, once all of that is done, around stage 75, we may even be
    able to rip out the per-vnode locks that UFS uses and replace them with
    fine-grained data block range locks, which will allow massive parallelism
    even operating on a single file.

    This is a pretty ambitious plan, it could take me into next year to 
    finish it all but when it is done we will be extremely well positioned
    for the ultimate goal of implementing fully transparent clustering.

						-Matt