approach on getting nullfs to work again

Matthew Dillon dillon at apollo.backplane.com
Wed Feb 9 14:41:13 PST 2005


:[original private post to matt, but something in my mail system must eat=20
:mails, so i'm hoping that this one will get through]

    Nope, I'm just overloaded.

:What happens if a file or directory in the underlying filesystem is being=20
:renamed or deleted? Doesn't that mean that I need to adjust the namecache f=
:or=20
:the nullfs layer, too?
:...
:
:We thought of a solution: overlay filesystems must lock their covered (i'll=
:=20
:call it "shadow") parallel namecache entries, too, if they are being locked=
:=2E=20
:Whereas this is not complicated to implement in cache_lock(), there is=20
:another problem: the namecache doesn't know about overlay filesystems. if=20
:doesn't know that there exist shadow namecache entries. so there must be so=
:me=20
:way of communication between namecache and vfs, maybe some=20
:vop_cache_create()?
:
:now this got a rather long mail, thanks for your attention
:hoping for input,
:  simon

    Ok.  We have two problems.  The second is solved as you say... the 
    overlay filesystem itself is aware of the underlying filesystem and
    must lock the underlying namecache record.  That is fairly straight
    forward.

    The rename-in-underlying-filesystem problem is a cache-coherency issue,
    solved by our (not yet existant) cache coherency layer! :-)

    So the question begins:  Can we construct a minimal cache coherency
    layer that can be used to help build nullfs and unionfs but that will
    not have to ripped out when we do the 'real' layer ?  I think the answer
    is: yes, we can.  We can create a minimal cache coherency layer
    based on the vnode's v_namecache list.

    Then it becomes a question of how complex a layer should we try to create?
    Taking for example a rename() in the underlying filesystem... do we 
    want to try to propogate the rename to the overlay or do we simply want
    to invalidate the overlay?  I think to begin with we just want to 
    invalidate the overlay.

    When I designed the new namecache topology I considered the possibility
    of having to deal with multiple overlayed filesystems and made the
    vnode's v_namecache a list of namecache records instead of a pointer to
    a single record.  The idea being that instead of having nullfs fake-up
    vnodes (like it does in FreeBSD) we instead have it return the *actual*
    vnode and only fake-up the namecache topology.  The system has no problem
    with multiple namecache records referencing the same vnode.  This
    greatly reduces the burden on nullfs to translate VOP calls... it only
    has to deal with namecache related translations, it does NOT have to 
    deal with things like VOP_READ().  The notion of the 'current'
    directory is now a namecache record in DragonFly, so we can get away 
    with this without confusing someone CD'd into a nullfs filesystem.
    (In FreeBSD the 'current directory' is a vnode and hence nullfs and
    unionfs had to fake-up the vnode.  In DragonFly it is a namecache
    pointer and we do NOT have to fake-up the vnode).

    Ok, so once that is dealt with we need to make sure that the cache
    invalidation mechanism, our skeleton cache coherency layer, does
    not deadlock when it takes a locked namecache record and has to 
    invalidate a namecache topology elsewhere.  This case only occurs
    when a filesystem operation on the UNDERLYING filesystem occurs, because
    the underlying filesystem is not aware of the overlay.  In the case
    of the nullfs overlay the nullfs code is aware of the underlying filesystem
    and will make the appropriate namecache calls to the underlying
    filesystem's namecache topology.

    For an operation being done directly on the underlying filesystem the
    underlying filesystem is not aware of the overlay, but the namecache
    code IS aware of the overlay because it sees multiple namecache
    records attached to the vnode.  So the namecache code must scan
    the list of namecache structures associated with the vnode and issue
    the appropriate cache_inval*() calls on the namecache records other
    then the one it was called with.

    I think this is all very doable and, even better, does not represent
    any major surgery for systems not using nullfs (which is all of the
    right now), so we can keep things stable during the work.  I know 
    there are several people interested in making nullfs work again, 
    especially Simon.  Who has time to actually code?  I would be able to
    help out but I'd prefer not to do the core coding.  

    Questions?  Interest?  Simon, you want to code this up ?

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list