hammer_alloc_data panic
Matthew Dillon
dillon at apollo.backplane.com
Tue Jul 15 18:24:46 PDT 2008
:I do not intend to sound discouraging; I'm just worried that the cries of
:those who have hit the reblocking issues and/or some stray bugs are going
:to cover the positive feedback.
:
:Aggelos
Actually I think we are doing very well, though I can see why
you might be a little rattled looking at it from the outside. I
apologize for that, and I will try to explain what is going on the
alleviate any concerns. In fact, I am going to go into great detail,
this is as much a philosophical document as it is an explanation :-)
It is virtually the only form of development possible for a one-man
project, or even a two or three-man project.
Nearly all the bug flow is due to the continued work being pushed into
the filesystem. That work essentially ended last weekend with the
last major mirroring infrastructure commit.
Virtually none of the bug flow is related to the older HAMMER code
pertaining to basic filesystem operation. For example, the UNDO
crash recovery and filesystem corruption bugs stopped occuring almost
a month ago. Basic filesystem operations... read, write, open, close,
readdir, chmod, etc.. have been stable for well over 2 months.
Historical lookups and snapshots have been stable for over 3 months.
I purposefully destabilized truncation for a few days last week,
and I purposefully destabilized the deadlock handling for a few days
last weekend, all in order to the mirroring code operational (and in
the case of truncation to fix UNDO FIFO issues related to the
limited UNDO space in small HAMMER filesystems).
When I said 2 weeks ago that I wasn't sure I would be able to get the
mirroring and PFS code in, this is what I was talking about. It isn't
just coding and committing, it is also getting the basic testing done,
the utility support done, and fixing the bugs introduced when surgery
is required on other parts of the filesystem to support the new feature.
What do I mean by purposeful destabilization? Let me give you another
example. Taking the mirroring code again. In order to propagate a
transaction id up the B-Tree to support incremental mirroring I couldn't
abort half way through with an EDEADLK and have the high level code
retry, because the governing insertion or deletion had already occured.
So what I did was implement the propagation *without* deadlock handling,
got it working, then worked through the deadlocks (the 'purposeful
destabilization') that I had created. I knew I was introducing some
deadlock issues when I did that, but it was still the fastest way to
get it implemented.
So what you are seeing is not really new crops of unexpected bugs, but
instead mostly expected bugs whos flow is carefully managed so they
will be fixed by the release, and a few I left on the backburner
(mostly related to filesystem-full issues but also a few related to the
handling of I/O errors), because I knew I could fix them in a day or two.
80% of the bug flow is from purposefully destabilization, and about 20%
is in the 'unexpected bug' category.
HAMMER is a really complex project, and the complexity is somewhat
of a moving target because all the myrid theory does not always fit
together seemlessly. It is not possible to implement each subsystem
independant of the other subsystems to the point where it is perfect.
Invariably working on a later subsystem requires going back to the
earlier ones and making (sometimes major) changes to the algorithms,
with massive debugging inbetween each major piece of subsystem work so
the bugs would not create geometrically complex (and hard to debug)
failures.
The constant flow of bugs is the intended outcome for this sort of
development style. It is the ONLY single-person development style
that has even a half chance of working for a complex project,
something I have learned through the years with various large projects
such as Diablo, various embedded OSs, DICE (The C compiler I wrote for
the Amiga many years ago), numerous other projects, and now HAMMER.
In anycase this week is crunch time for the remaining bugs and I'm still
on schedule! I'm quite happy that I get to dedicate this week just to
fixing bugs, and won't be introducing any new algorithms to start the
endless bug cycle going again :-). Even I was feeling a bit flustered
last week, trying to squeeze that massive, massive mirroring
implementation in. That was literally a 100-hour work week for me.
I was stressing out big-time last week. This week is smooth sailing.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Bugs
mailing list