Kernel update / DragonFly_Stable tag reverted to September 13th
Matthew Dillon
dillon at apollo.backplane.com
Fri Oct 8 03:09:39 PDT 2004
I still recommend that anyone with data they can't afford to lose
stick with the DragonFly_Stable tag for now.
I've made progress in my local tree tracking down the instabilities
on HEAD. None of it is committed yet and none of it will be until
I get a good solid 2 days of continuous successful tests on my test
boxes.
The main issue is that the 'old' vnode locking code is just too fragile
to handle the blocking conditions introduced by the 'new' namecache
code. The vnode locking code has *always* been an extremely complex
mess, with dozens and dozens of special cases. So complex that it's
very easy to break it, and break it is what I did. The old BSD VFS code
(some of it is quite ancient!) used some extremely weird algorithms for
deactivating and reclaiming vnodes.
* Instead of using a normal locked, ref'd vnode during reclamation
(scrapping a vnode so it could be reused), the old code kept the
ref count at 0 and put the vnode in a special, very fragile state
for deactivation and reclamation purposes. There are many
historical reasons for this but the biggest is that the the
original vnode code put the vnode locks in attached filesystem
meta-data, in v_data, rather then embedding the locks in the vnode
itself.
Fortunately the lock meta-data issue is already mostly fixed and
committed, so this is a matter of rewriting all the APIs to
assume an embedded vnode lock and get rid of the special 0-refcount
termination states.
* The vnode interlocks have been a disaster. These were introduced
in late FreeBSD 3.x or early 4.x or somewhere around there, I'd
have to look at the FreeBSD CVS logs to see where. They were
designed for the original SMP implementation (and FreeBSD-5/6,
if anything, has made the interactions even *more* complex and
difficult to code to properly).
The problem with the interlocks is that accessing a vnode from
scratch requires several locking steps rather then just one
and its impossible to prevent a vnode from being ripped out
from under you, which is why vnodes wound up having to use a
stable storage medium.
* Numerous other issues.
I've managed to write most of the mess. It only took a day but I'm sure
there are bugs so I won't commit anything until early next week probably.
I will post patch sets (it's already 11000+ lines worth of patches) so
the brave people can test it.
The new code uses a normal exclusively-locked vnode for VOP_INACTIVE
and VOP_RECLAIM (reclamation of a vnode for later reuse). It also
entirely gets rid of the vnode interlock, and I rewrote the mountlist
scanning code and getnewvnode() to close all the little timing windows
they had.
This is in addition to all the new namecache code which is already in
the tree.
I'll probably post the first patch set this weekend sometime. I can
already get through a buildworld with it so my hope is that once I track
down the last few issues (there are always issues when one generates
an 11000+ line patch set in one day!) that I will have something that
is solid and stable. But, again, nothing is going to be committed until
my test boxes get through two whole days of buildworld -j loops.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list