VOP_RENAME of the future

Sun Aug 6 16:09:36 PDT 2006

Hi,

VOP_RENAME is a notorious beast, and, IIRC, Matt (and maybe others) has
claimed that it should be tamed. I wonder what are we to expect here...

As I see, the main issues are the following:

1) Many things has to be locked at the same time.
2) It should be checked if the target dir is not under the source dir 
   in the fs tree.
3) It's hard to ensure consistency (eg. preserving validity of the check
   in 2)) when other topology modifying events (typically, another rename)
   can occur during the rename.

The idealistic thing would be an effective treatment of these at the VFS
level, such that an fs implementer shouldn't care about these things at
all, still everything would be just fine.

See how it's tackled in the various open Unices.

Linux goes for the nice API and gets it via a blunt solution: the Linux
VFS does take care about 1)-3). For 3), they just have a per-vfs rename
lock which ensures that only one rename can run at a time.

Based on remarks in the OpenSolaris code, Solaris 9 used to do this,
too.

Sol 10 and the BSDs choose a more fine-grained scheme, but the work has
to be done for each particular VFS. However, my impression is that the
SUN guys have actually took the effort to do it carefully enough, while
the BSDs just let the old hacky 4.4 BSD code rot further peacefully. At
least, all four of them has the lines

        if ((error = vn_lock(fvp, LK_EXCLUSIVE)) != 0)
	        goto abortit;

in their ufs_rename() -- ie., let's treat issue 1) naively, if it fails,
well... goddam the bloody thing -- which doesn't seem to be too much a
product of a well-tought-out design. (All these are really just my
impressions after a quick run-through of the code, take these judgements
with a piece of salt please. It's not intended to be a flame bait.)

In particular, how DragonFly looks like?

1) has been partially solved, because namecache locking is taken care by
the VFS layer. However, vnode-level locking still has do be done by the
particular file systems (and it suffers from issues like the presence
of the above two lines). I wonder if updating the filesystems so that
they will directly use the namecache API will automagically solve this,
or there is still some hard work to be done at the VFS level...

2) is not yet handled by the VFS, but since we have the namecache code,
I think it would be easy to implement in a generic way (just like Linux
does this with their dentry thingy). Of course, this won't be really
effective without dealing with 3)...

3) -- I really don't see if there is anything one could do at the VFS
level apart from the ugly big rename lock...

So, whatcha gonna do about it?

Csaba