Root mount failed:5

Matthew Dillon dillon at apollo.backplane.com
Fri Oct 12 11:13:23 PDT 2012


    Discussing this on IRC we came up with another scenario, again
    related to a power failure.

    When we update b-tree nodes we issue BIO writes of the whole node,
    but we only generate UNDO information for the byte-ranges that changed
    in many cases.

    Because your b-tree node that failed is so close to the root I suspect
    it was doing a byte-range undo.

    If a power failure occured during the physical disk write of the b-tree
    node it is possible (even likely) that the disk corrupted more than just
    the byte range being modified.  It could have corrupted data outside
    the modification range which was still being rewritten as part of the I/O
    operation.

    In this situation the UNDO on reboot would not have been able to fix the
    problem.  I suspect that this might have been what happened, if your
    crash was due to a power failure.

    Now unfortunately when a power failure occurs during a physical write
    to the disk, it is possible for the disk to corrupt numerous sectors
    above and beyond the ones actually being written, particularly with a
    consumer drive that might be doing whole-track writes.  However,
    the higher probability is that we can recover more of these situations
    if we change HAMMER to write whole-block UNDOs for B-Tree nodes instead
    of byte-range undos.  So I am going to make that change for this upcoming
    release.

    This will generate more UNDO data but I don't see that we have a choice.
    We want HAMMER to be able to recover from more of these power-failure
    types of situations.  I can't cover all the bases because the disk can
    literally corrupt anything when a power failure occurs during a write,
    but I should be able to cover more conventional cases by making this
    change.

						-Matt




More information about the Users mailing list