Name cache question
Matthew Dillon
dillon at apollo.backplane.com
Sat Dec 10 15:04:49 PST 2005
:Hi all,
:
:I am interested in exploring the option of operating DragonFly with non-
:traditional filesystem layouts that make use of features like
:hard-linked directories, nullfs/unionfs-type mounts, per-process
:namespace, etc, but I'm having trouble figuring out how it would work
:with the new namecache system.
:
:I did try nullfs and got very bizarre results. nullfs was using the
:old VFS interface, but when I looked at updating it, I had several
:concerns.
nullfs and unionfs are in a non-working state. I would be amazed if
you managed to even mount something with either! They need a complete
rewrite.
:It appears that we are representing a (dvp,cn) -> vp association with
:up to n distinct namecache structures, where n is the number of unique
:nc_parent chains that lead to vp. As a consequence:
Yes, this is correct.
:1. In the general case that the filesystem layout is a DAG, because the
:namecache is a tree, we end up with a potentially high ratio of ncp's
:to the vnode-associations that they are supposed to represent;
The ratio will be limited for nullfs/unionfs style mounts. The
moment you introduce directory hardlinks, however, the ability
to create circular dependancies can lead to long directory chains.
The real question is in the practical nature of supporting such chaining.
If its important to be able to CD into a tree and ".." back out if it
on 1:1 basis (without the ".." shortcutting the operation), then there
is no way around the issue other then to limit the allowed depth.
DragonFly's namecache really requires that the path leading to any
active node be present in the namecache. We break the rule somewhat
on NFS servers due to the way NFS file handles work, but that's the
goal.
:2. There's no way in the API to express the notion that a namecache
:entry is an alias of another one. v_namecache particularly does not
:help, because that indexes * -> vp associations, which may be totally
:independent (like normal hard-links to files). In addition in stacking
:FS scenarios, the vp's may be totally different and not show in the
:v_namecache at all. So we can't even conservatively fallback to say
:doing cache_inval_vp exclusively to preserve coherency.
Determining whether a file or directory is an alias of another file
or directory is not something the namecache can easily do. The problem
goes way beyond simply knowing whether something is an alias or not...
it extends into knowing what portions of the cached data related to
the vnode are cache coherent or stale. What is needed is an entire
cache coherency subsystem in of itself.
I've mentioned the concept of a cache coherency layer numerous times.
We do not have such a layer. It's a very complex undertaking especially
considering that we want to extend it to a clustered solution. This
is why I have made no attempt whatsoever to integrate cache coherency
into the namecache. It's just too big a beast.
Traditionally aliasing is determined by looking at the actual inode
number (e.g. as returned by stat()). This still works on a rough,
granular level, but even this simple concept can become quite complex
when we start talking about filesystem snapshots and copy-on-write
filesystems such as Sun's ZFS.
:3. Because the system has no knowledge of which ncp's are aliases of
:which, we don't get the benefit of caching for those aliases. The
:more we DAG-ify the filesystem, the less benefit of namecache we get.
The caching occurs at the vnode/inode level, not the namecache level.
So we do get the benefit to a degree.
Theoretically we could even allow loops in the namecache topology,
where we keep track of the number of times we have recursed into a cyclic
node, but for practical use I think it is easier to keep the algorithm
simple and allocate separate copies of the namecache record so each
record gives us a unique 'path', whether aliased or not.
:I understand that some of these design decisions were motivated by the
:desire to have ../fullpath handled by the namecache, but I wonder if that
:will prove to be a high price to pay in terms of efficiency and API
:complexity when we move ahead to per-process VFS environments and other
:non-traditional uses of the filesystem.
No, that was just a nice side effect. There are several ".." related
issues. Probably the biggest issue is simply coming up with a design
that allows us to avoid deadlocking path searches, operations such as
rename, and locking namecache entries in general. If you look at
how traditional BSD code deals with ".." you will see that it is far
worse then the rewrite I did for DragonFly. In fact, I even took a
page from Linux's playbook in implementing a specific ".." VOP in
order to handle it as a separate case.
:How should we proceed? How should the namecache of an upper-level FS be
:invalidated as a result of activity on the lower-level FS?
:
:-Eric
This is what a cache coherency layer would be tasked with addressing.
Certain filesystems, such as traditional NFSv3, simply do not have
the ability to track changes backwards. That doesn't mean we can't
implement a fully coherent caching infrastructure for DragonFly, it
simply means that certain filesystems won't be able to make full use
of it or might have to proactively check cached data for changes or
something like that.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list