cache_lookup() work this week.

Matthew Dillon dillon at apollo.backplane.com
Thu Sep 4 11:28:30 PDT 2003


:...
:
:Consider also:
:    cd /a/b/c/d   ;  ln /a/outside
:Time passes, and someone else types:
:    cd /a/b/c     ;  rm -Rf *
:
:The person will think that they are just safely removing the
:directory and everything below it, but now they could be
:removing much more than that.  We'd have to do something to
:guard against that problem too.  (and these hard links could
:be created by users with nefarious purposes in mind, so the
:person doing the 'rm' would have no reason to suspect that
:this would be an issue).

    This is a good one, and easy to solve... rm would just unlink() or
    rmdir() the directory first, whether it is empty or not.  If the unlink
    succeeds then rm considers its work done.  

    The last instance of the directory would not be unlinkable... rm -rf
    would have to recurse through and delete the underlying files first.

:>     Only the (rdev,inode#) for the elements representing
:>     the current path need to be remembered so the memory
:>     use is small.
:
:If we're going to have real hard-links, then it would probably be
:important to add a a "number-of-hard-dirlinks" field to stat().
:This would be a separate value from the st_nlink field, in that
:it would only count the number of directory hard-links.  Maybe
:call it st_ndirlink.  Then any program which wants to do this
:will only have to remember (rdev, inode) for those directories
:where this value is > 1.  That makes the overhead even less...

    Right.

:
:This field might also provide a way to address the 'rm' problem
:mentioned above.  If st_ndirlink > 1, then just destroy the hard
:link and do *not* remove the files underneath the hard link.
:
:But I'm still very uneasy with the idea of real hard links on
:directories.  I think it's too much potential for trouble
:without enough of a benefit.

    Right.

    Oh, don't get me wrong, I am both uneasy and thrilled about the prospect.
    I think it is worth having for precisely that reason :-)

:>:>     and (B) it will be possible to implement semi-hard
:>:>     links, basically softlinks that *look* like hardlinks
:>:
:>:What will this be like?  Ie, what will be the difference
:>:between hard-links and semi-hard links?  Will this be
:>:something like the way OpenAFS handles volume-mounting?
:
:On this question, I just curious in the lower-level details,
:so it's different than my hard-links question. I definitely
:like this idea, I was just wondering how it'd be implemented.
: From other messages it does sounds like you intend to implement
:this in about the same way that OpenAFS does volume-mounting,
:which is what I was wondering.  Thanks.
:
:-- 
:Garance Alistair Drosehn            =   gad at xxxxxxxxxxxxxxxxxxxx

    It would be implemented as a softlink from the point of view of the
    filesystem, but namei() would interpret a flag on it to mean that the
    namecache should keep a separate chain through the link rather then
    'jump' through the link.

    Maybe this will help.  This is the new namecache structure I am
    using:

struct  namecache {
        LIST_ENTRY(namecache) nc_hash;  /* hash chain (parentvp,name) */
        TAILQ_ENTRY(namecache) nc_entry; /* scan via nc_parent->nc_list */
        TAILQ_ENTRY(namecache) nc_vnode; /* scan via vnode->v_namecache */
        TAILQ_HEAD(, namecache) nc_list; /* list of children */
        struct namecache *nc_parent;    /* namecache entry for parent */
        struct  vnode *nc_vp;           /* vnode representing name or NULL */
        int     nc_refs;                /* ref count prevents deletion */
        u_char  nc_flag;
        u_char  nc_nlen;                /* The length of the name, 255 max */
        char    nc_name[0];             /* The segment name (embedded) */
};

    And in the vnode:

        TAILQ_HEAD(namecache_list, namecache) v_namecache;

    What we had before was that the vnode served as the central coordinating
    point for the namecache entries, both namecache entries (parent
    directories) feeding into the vnode and namecache entries (children in
    the directory) going out of the vnode.

    What we have above is that the namecache entries now serves as the central
    coordinating point and the vnode meerly heads a list of namecache entries
    associated with it in particular.

    With the new scheme it is possible to maintain completely independant 
    naming topologies that contain some 'shared' vnodes.  In the old scheme
    you could do that but you would loose track of which naming topology
    was used to locate the vnode.  In the new scheme the handle *IS* the
    namecache structure and thus the topology used to locate the vnode is 
    known, even if the vnode is shared amoungst several topologies.

    All I have to do, which is what I am working on right now, is change all
    the directory references in the codebase from vnodes to namecache pointers.

    For example, fd_cdir, fd_rdir, and fd_jdir in sys/filedesc.h need to
    be changed from vnode pointers to namecache pointers, and all the VOP_*
    functions which take 'directory vnodes' as arguments would now instead 
    take namecache pointers as arguments.  namei related functions which take
    and return directory vnodes would now have to take and return namecache
    pointers.  For that matter, these functions would have to take and
    return namecache pointers for everything, including file vnodes.

    This in turn will allow the lookup functions to gain holds on directories,
    files, and non-existant files (placeholders for create, rename) without
    having to obtain any vnode locks, which in turn allows us to completely
    get rid of the race to root problem as well as other common stalls 
    associated with blocking I/O during directory lookup operations.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list