vfsx18.patch available - vfs work / NFS server (expert developers only)

Matthew Dillon dillon at apollo.backplane.com
Sat Nov 6 18:49:26 PST 2004

    Another regular vfs patch.  This one attempts to fix the NFS server,
    but requires serious testing and some bugs are expected.

	fetch http://leaf.dragonflybsd.org/~dillon/vfsx18.patch

    NFS server work:  The NFS server is handed file handles over the network
    which it must convert into vnodes.  For lookup requests it must convert
    a filehandle representing a directory vnode into a directory vnode and
    from there into a namecache pointer.  The problem is that this directory
    vnode may be out in the middle of nowhere, with no namecache
    infrastructure leading up to it.

    There are two possible ways to fix this.  The first one, which I already
    tried (and failed) with is to allow disconnected namecache records to exist
    in the namecache topology.  One can then assign an unnamed namecache
    structure to the vnode in question and perform the lookup, only
    reconnecting the record to the rest of the namecache hierarchy later on
    in an opportunistic fashion.  Even using this method there is still a
    serious issue with ".." lookups, which must work, so it does not allow
    the NFS server to avoid doing a ".." lookup.

    Christoph Hellwig sent me a nice email outlining how Linux handles these
    issues.  It turns out that linux is in fact using the above solution,
    complete with a separate VFS function to lookup the parent directory,
    but it apparently took them years to stabilize it.  I can well understand
    the difficulty since my attempt on Friday resulted in failure :-)


    The second way to fix the problem is to actually track the directory
    backwards through ".." until we hit a directory which already has 
    namecache record associated with it, or until we hit the root of the
    filesystem, and then reconstruct the namecache topology forwards to
    create a fully connected topology leading to the requested vnode. 
    Reconstructing the topology actually requires a directory scan to 
    locate the name associated with the inode number we are trying to find.
    i.e. take target directory, lookup ".." (the parent directory), then
    scan the parent directory to find the target directory.

    * This solution has the advantage of being compartmentalized... only NFS
    needs to use it, and we do not pollute the rest of the namecache API
    with a mess of code to deal with reconnecting disconnected namecache

    * This solution has the disadvantage of being inefficient... one has to 
    actually scan a directory to figure out the name.

    * But, through testing with buildworld I've determined that directories
    tend to be well cached on the NFS server and thus most (99.999%) of the
    time The NFS server does not have to actually scan a directory.

    So I have chosen the second solution, and that is what is in this
    patch.  I also, like Linux, introduced a new VOP call to lookup the
    parent directory called vop_nlookupdotdot(), since that is required
    for either solution.

    My buildworld loop seems to be working fairly well, but I expect there
    to be a few operational bugs.  I had one instance in a slightly earlier
    patch set where a softlink got garbled, I'm not sure how it happened 
    and it did not repeat, but there are almost certainly some issues with
    this patch set.

    This patch set has a debug sysctl, debug.ncvp_debug.  Setting this to 1
    will spew out summary information whenever the NFS server has to resort
    to a directory scan.  Setting it to 2 spews out the actual directory
    scan as well.  Setting it to 3 forces the nfs server to ignore any
    existing namecache record and *ALWAYS* do a directory scan, and will
    generate a huge amount of console spew as well.  I use 3 only to try
    to exercise the code paths.

    TODO: fix whatever bugs pop out of the woodwork, unionfs, and nullfs.

					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>

More information about the Kernel mailing list