Name cache question

Sat Dec 10 15:04:49 PST 2005

:Hi all,
:
:I am interested in exploring the option of operating DragonFly with non-
:traditional filesystem layouts that make use of features like
:hard-linked directories, nullfs/unionfs-type mounts, per-process
:namespace, etc, but I'm having trouble figuring out how it would work
:with the new namecache system.
:
:I did try nullfs and got very bizarre results. nullfs was using the
:old VFS interface, but when I looked at updating it, I had several
:concerns.

    nullfs and unionfs are in a non-working state.  I would be amazed if
    you managed to even mount something with either!  They need a complete
    rewrite.

:It appears that we are representing a (dvp,cn) -> vp association with
:up to n distinct namecache structures, where n is the number of unique
:nc_parent chains that lead to vp. As a consequence:

    Yes, this is correct.

:1. In the general case that the filesystem layout is a DAG, because the
:namecache is a tree, we end up with a potentially high ratio of ncp's
:to the vnode-associations that they are supposed to represent;

    The ratio will be limited for nullfs/unionfs style mounts.  The
    moment you introduce directory hardlinks, however, the ability
    to create circular dependancies can lead to long directory chains.

    The real question is in the practical nature of supporting such chaining.
    If its important to be able to CD into a tree and ".." back out if it
    on 1:1 basis (without the ".." shortcutting the operation), then there
    is no way around the issue other then to limit the allowed depth.

    DragonFly's namecache really requires that the path leading to any
    active node be present in the namecache.  We break the rule somewhat
    on NFS servers due to the way NFS file handles work, but that's the 
    goal.

:2. There's no way in the API to express the notion that a namecache
:entry is an alias of another one. v_namecache particularly does not
:help, because that indexes * -> vp associations, which may be totally
:independent (like normal hard-links to files). In addition in stacking
:FS scenarios, the vp's may be totally different and not show in the
:v_namecache at all. So we can't even conservatively fallback to say
:doing cache_inval_vp exclusively to preserve coherency.

    Determining whether a file or directory is an alias of another file
    or directory is not something the namecache can easily do.  The problem
    goes way beyond simply knowing whether something is an alias or not...
    it extends into knowing what portions of the cached data related to
    the vnode are cache coherent or stale.  What is needed is an entire
    cache coherency subsystem in of itself. 

    I've mentioned the concept of a cache coherency layer numerous times. 
    We do not have such a layer.  It's a very complex undertaking especially
    considering that we want to extend it to a clustered solution.  This
    is why I have made no attempt whatsoever to integrate cache coherency
    into the namecache.  It's just too big a beast.

    Traditionally aliasing is determined by looking at the actual inode
    number (e.g. as returned by stat()).  This still works on a rough,
    granular level, but even this simple concept can become quite complex
    when we start talking about filesystem snapshots and copy-on-write
    filesystems such as Sun's ZFS.

:3. Because the system has no knowledge of which ncp's are aliases of
:which, we don't get the benefit of caching for those aliases. The
:more we DAG-ify the filesystem, the less benefit of namecache we get.

    The caching occurs at the vnode/inode level, not the namecache level.
    So we do get the benefit to a degree.

    Theoretically we could even allow loops in the namecache topology,
    where we keep track of the number of times we have recursed into a cyclic
    node, but for practical use I think it is easier to keep the algorithm
    simple and allocate separate copies of the namecache record so each
    record gives us a unique 'path', whether aliased or not.

:I understand that some of these design decisions were motivated by the
:desire to have ../fullpath handled by the namecache, but I wonder if that
:will prove to be a high price to pay in terms of efficiency and API
:complexity when we move ahead to per-process VFS environments and other
:non-traditional uses of the filesystem.

    No, that was just a nice side effect.  There are several ".." related
    issues.  Probably the biggest issue is simply coming up with a design
    that allows us to avoid deadlocking path searches, operations such as
    rename, and locking namecache entries in general.  If you look at
    how traditional BSD code deals with ".." you will see that it is far
    worse then the rewrite I did for DragonFly.  In fact, I even took a
    page from Linux's playbook in implementing a specific ".." VOP in
    order to handle it as a separate case.

:How should we proceed? How should the namecache of an upper-level FS be
:invalidated as a result of activity on the lower-level FS?
:
:-Eric

    This is what a cache coherency layer would be tasked with addressing.
    Certain filesystems, such as traditional NFSv3, simply do not have
    the ability to track changes backwards.  That doesn't mean we can't
    implement a fully coherent caching infrastructure for DragonFly, it
    simply means that certain filesystems won't be able to make full use
    of it or might have to proactively check cached data for changes or
    something like that.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>