vfsx18.patch available - vfs work / NFS server (expert developers only)
dillon at apollo.backplane.com
Sat Nov 6 18:49:26 PST 2004
Another regular vfs patch. This one attempts to fix the NFS server,
but requires serious testing and some bugs are expected.
NFS server work: The NFS server is handed file handles over the network
which it must convert into vnodes. For lookup requests it must convert
a filehandle representing a directory vnode into a directory vnode and
from there into a namecache pointer. The problem is that this directory
vnode may be out in the middle of nowhere, with no namecache
infrastructure leading up to it.
There are two possible ways to fix this. The first one, which I already
tried (and failed) with is to allow disconnected namecache records to exist
in the namecache topology. One can then assign an unnamed namecache
structure to the vnode in question and perform the lookup, only
reconnecting the record to the rest of the namecache hierarchy later on
in an opportunistic fashion. Even using this method there is still a
serious issue with ".." lookups, which must work, so it does not allow
the NFS server to avoid doing a ".." lookup.
Christoph Hellwig sent me a nice email outlining how Linux handles these
issues. It turns out that linux is in fact using the above solution,
complete with a separate VFS function to lookup the parent directory,
but it apparently took them years to stabilize it. I can well understand
the difficulty since my attempt on Friday resulted in failure :-)
The second way to fix the problem is to actually track the directory
backwards through ".." until we hit a directory which already has
namecache record associated with it, or until we hit the root of the
filesystem, and then reconstruct the namecache topology forwards to
create a fully connected topology leading to the requested vnode.
Reconstructing the topology actually requires a directory scan to
locate the name associated with the inode number we are trying to find.
i.e. take target directory, lookup ".." (the parent directory), then
scan the parent directory to find the target directory.
* This solution has the advantage of being compartmentalized... only NFS
needs to use it, and we do not pollute the rest of the namecache API
with a mess of code to deal with reconnecting disconnected namecache
* This solution has the disadvantage of being inefficient... one has to
actually scan a directory to figure out the name.
* But, through testing with buildworld I've determined that directories
tend to be well cached on the NFS server and thus most (99.999%) of the
time The NFS server does not have to actually scan a directory.
So I have chosen the second solution, and that is what is in this
patch. I also, like Linux, introduced a new VOP call to lookup the
parent directory called vop_nlookupdotdot(), since that is required
for either solution.
My buildworld loop seems to be working fairly well, but I expect there
to be a few operational bugs. I had one instance in a slightly earlier
patch set where a softlink got garbled, I'm not sure how it happened
and it did not repeat, but there are almost certainly some issues with
this patch set.
This patch set has a debug sysctl, debug.ncvp_debug. Setting this to 1
will spew out summary information whenever the NFS server has to resort
to a directory scan. Setting it to 2 spews out the actual directory
scan as well. Setting it to 3 forces the nfs server to ignore any
existing namecache record and *ALWAYS* do a directory scan, and will
generate a huge amount of console spew as well. I use 3 only to try
to exercise the code paths.
TODO: fix whatever bugs pop out of the woodwork, unionfs, and nullfs.
<dillon at xxxxxxxxxxxxx>
More information about the Kernel