Root mount failed:5
Matthew Dillon
dillon at apollo.backplane.com
Fri Oct 12 11:13:23 PDT 2012
Discussing this on IRC we came up with another scenario, again
related to a power failure.
When we update b-tree nodes we issue BIO writes of the whole node,
but we only generate UNDO information for the byte-ranges that changed
in many cases.
Because your b-tree node that failed is so close to the root I suspect
it was doing a byte-range undo.
If a power failure occured during the physical disk write of the b-tree
node it is possible (even likely) that the disk corrupted more than just
the byte range being modified. It could have corrupted data outside
the modification range which was still being rewritten as part of the I/O
operation.
In this situation the UNDO on reboot would not have been able to fix the
problem. I suspect that this might have been what happened, if your
crash was due to a power failure.
Now unfortunately when a power failure occurs during a physical write
to the disk, it is possible for the disk to corrupt numerous sectors
above and beyond the ones actually being written, particularly with a
consumer drive that might be doing whole-track writes. However,
the higher probability is that we can recover more of these situations
if we change HAMMER to write whole-block UNDOs for B-Tree nodes instead
of byte-range undos. So I am going to make that change for this upcoming
release.
This will generate more UNDO data but I don't see that we have a choice.
We want HAMMER to be able to recover from more of these power-failure
types of situations. I can't cover all the bases because the disk can
literally corrupt anything when a power failure occurs during a write,
but I should be able to cover more conventional cases by making this
change.
-Matt
More information about the Users
mailing list