VOP_RENAME of the future

Sun Aug 6 22:26:27 PDT 2006

:to be done for each particular VFS. However, my impression is that the
:SUN guys have actually took the effort to do it carefully enough, while
:the BSDs just let the old hacky 4.4 BSD code rot further peacefully. At
:least, all four of them has the lines
:
:        if ((error = vn_lock(fvp, LK_EXCLUSIVE)) != 0)
:	        goto abortit;

    That's pretty much how the BSD's do it, because the old filesystems
    like UFS require an atomic LOOKUP + OP (e.g. LOOKUP + CREATE,
    LOOKUP + REMOVE) combination.  UFS stores side effects from the LOOKUP
    which are then used in the later CREATE/REMOVE/RENAME/SYMLINK/MKNOD/RMDIR,
    etc.  The vnode must remain locked through both operations.

:in their ufs_rename() -- ie., let's treat issue 1) naively, if it fails,
:well... goddam the bloody thing -- which doesn't seem to be too much a
:product of a well-tought-out design. (All these are really just my
:impressions after a quick run-through of the code, take these judgements
:with a piece of salt please. It's not intended to be a flame bait.)
:
:In particular, how DragonFly looks like?
:
:1) has been partially solved, because namecache locking is taken care by
:the VFS layer. However, vnode-level locking still has do be done by the
:particular file systems (and it suffers from issues like the presence
:of the above two lines). I wonder if updating the filesystems so that
:they will directly use the namecache API will automagically solve this,
:or there is still some hard work to be done at the VFS level...

    Yes.  It is possible to update the filesystem code (e.g. UFS) to
    directly use the new 'N' ops (e.g. VOP_NRESOLVE, VOP_NRENAME, etc).  

    But it is not an easy thing to do.  Because of the side effects that
    UFS stores in the in-memory inode from the VOP_LOOKUP + VOP_(oldop)
    combination, significant portions of UFS have to be rewritten
    to implement the new ops.

    If that were to be done then, yes, the old ops can be thrown away
    and the direct vnode locking could be done in a more fine-grained
    fashion.  Its a huge job though.

:2) is not yet handled by the VFS, but since we have the namecache code,
:I think it would be easy to implement in a generic way (just like Linux
:does this with their dentry thingy). Of course, this won't be really
:effective without dealing with 3)...
:
:3) -- I really don't see if there is anything one could do at the VFS
:level apart from the ugly big rename lock...
:
:So, whatcha gonna do about it?
:
:Csaba

    The two files in a VOP_NRENAME are protected by namecache locks.  But
    the actual directory entries are not protected.  The hardest thing to
    do to implement a fine-grained lock would be for the filesystem to
    do the right thing while it is modifying the directory entries.  Buffer
    cache buffer locks might be sufficient, or they might not be. I don't
    know.

    I am not intending to do any more work on UFS then I have to.  Any
    *NEW* VFS ports can just use the new VOP_N*() calls and not implement
    the old junk at all.

    As part of the clustering and userland VFS work I may remove the high
    level vnode locking requirements for nearly all the VOP ops, but all
    that means is that most of the filesystems (e.g. UFS) would have to do
    the locking internally, themselves.  It isn't really removing any 
    locking, it would just move it into the VFS so we don't hold exclusive
    locks on vnodes over remote VFS's or userland VFSs that could otherwise
    deadlock the entire machine.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>