Kernel update / DragonFly_Stable tag reverted to September 13th

Fri Oct 8 03:09:39 PDT 2004

    I still recommend that anyone with data they can't afford to lose
    stick with the DragonFly_Stable tag for now.

    I've made progress in my local tree tracking down the instabilities
    on HEAD.  None of it is committed yet and none of it will be until
    I get a good solid 2 days of continuous successful tests on my test
    boxes.

    The main issue is that the 'old' vnode locking code is just too fragile
    to handle the blocking conditions introduced by the 'new' namecache
    code.  The vnode locking code has *always* been an extremely complex 
    mess, with dozens and dozens of special cases.  So complex that it's
    very easy to break it, and break it is what I did.  The old BSD VFS code
    (some of it is quite ancient!) used some extremely weird algorithms for
    deactivating and reclaiming vnodes.

	* Instead of using a normal locked, ref'd vnode during reclamation
	  (scrapping a vnode so it could be reused), the old code kept the
	  ref count at 0 and put the vnode in a special, very fragile state
	  for deactivation and reclamation purposes.  There are many 
	  historical reasons for this but the biggest is that the the
	  original vnode code put the vnode locks in attached filesystem
	  meta-data, in v_data, rather then embedding the locks in the vnode
	  itself.

	  Fortunately the lock meta-data issue is already mostly fixed and
	  committed, so this is a matter of rewriting all the APIs to
	  assume an embedded vnode lock and get rid of the special 0-refcount
	  termination states.

	* The vnode interlocks have been a disaster.  These were introduced
	  in late FreeBSD 3.x or early 4.x or somewhere around there, I'd
	  have to look at the FreeBSD CVS logs to see where.  They were
	  designed for the original SMP implementation (and FreeBSD-5/6,
	  if anything, has made the interactions even *more* complex and
	  difficult to code to properly).

	  The problem with the interlocks is that accessing a vnode from
	  scratch requires several locking steps rather then just one
	  and its impossible to prevent a vnode from being ripped out
	  from under you, which is why vnodes wound up having to use a
	  stable storage medium.

	* Numerous other issues.

    I've managed to write most of the mess.  It only took a day but I'm sure
    there are bugs so I won't commit anything until early next week probably.
    I will post patch sets (it's already 11000+ lines worth of patches) so
    the brave people can test it.

    The new code uses a normal exclusively-locked vnode for VOP_INACTIVE
    and VOP_RECLAIM (reclamation of a vnode for later reuse).  It also
    entirely gets rid of the vnode interlock, and I rewrote the mountlist
    scanning code and getnewvnode() to close all the little timing windows
    they had.

    This is in addition to all the new namecache code which is already in
    the tree. 

    I'll probably post the first patch set this weekend sometime.  I can
    already get through a buildworld with it so my hope is that once I track
    down the last few issues (there are always issues when one generates
    an 11000+ line patch set in one day!) that I will have something that
    is solid and stable.  But, again, nothing is going to be committed until
    my test boxes get through two whole days of buildworld -j loops.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>