Unable to mount hammer file system Undo failed

Thu Jul 19 13:31:16 PDT 2012

Wojciech Puchar writes:

>>> 
>>> Any Tree-like structure produces a huge risk of losing much more data that 
>>> was corrupted at first place.
>>
>> Not so sure about that statement, but well, let's agree we might disagree :)
> disagreement is a source of all good ideas. but you should explain why.

Well, arguing can be fun at times, but my free time is rather limited; i wished
that thread could die peacefully.

I think I made my points clear. As i'm far from being qualified to discuss
these topics, i'll just add a bit but won't repeat my statements about
prerequisites regarding where's the line must be drawn and which running
conditions are expected from the FS pov.

> my explanation below.
> 
>>
>> You asked for a little documentation about its layout, workings; this may be 
>> a good fit: http://www.dragonflybsd.org/presentations/nycbsdcon08/
> this is about older hammer revision.
> 
> Matthew claimed some time ago that new hammer is completely different.
> 
> But after reading i understood that everything is in B-Tree. exactly 
> what i call dangerous. B-Tree used to store everything, directory entries, inodes etc.

I won't speak on behalf of Matt, but iiuc HAMMER2 will use other structures
than B-Tree, with the goal to reduce complexity.
The presentation at the link i gave you is rather outdated, and targets HAMMER
"1" (first subversions of the FS in its first design).

> B-Tree are dangerous if they are used as the only way to access data. 
> Corrupted B-Tree does mean no access to anything below it!!
> 
> 
> What i see as a main difference between HAMMER and ZFS are:
> 
> 1) practical - hammer is very fast, don't use gigabytes of RAM and lots of 
> CPU speed. Not that i did a lot of tests but it seems like UFS speed, 
> sometimes even more, rarely less.
> 
> It is actually USAFUL, cannot be said on ZFS ;)

Sorry, i also just love ZFS for the business case i rely on it for. It has some
clearly nice features.

> 2) basic way of storing data are similar, details are different, danger is 
> similar

No: this is wrong. I won't make a digest of papers on both of them for you.
Read about it.

> 3) HAMMER have recovery program. It will need to read whole media. Assume 
> 2TB disk at 100MB/s -> 20000 seconds==6 hours.
> ZFS doesn't have, there are few businesses that recover ZFS data for 
> money. For sure they doesn't feel it's a crisis ;)

I never had to use that recovery program. If you search the archives, only a
handful of people really had a need for it. Don't anticipate you'll need to use
it routinely.
The truth is: whatever happens (crash, lost power supply, sick HDD), you'll
just mount it, maybe some transactions will complete/be rolled back, and that's
it. A matter of 10 seconds.

Using recover onto a 2TB medium will be slow, of course. But you're then trying
to recover a full filesystem, including history for as much as was there before
the crash.

> assume that i store my clients data in hammer filesystem and it crashed 
> completely,  but disks are fine. Assume it's tuesday 16pm, last copy done 
> automatically monday 17:30, failure found at 17pm, i am on place 18pm
> 
> I ask my client - what do you prefer:
> 
> - wait 6 hours and there is good deal of chance that most of your data 
> will be recovered. If so, the little few would be found out and recovered 
> from backup. If not we will start recovery from backup that would take 
> another 6 hours?

Moot point. See above.

> - just clear things out and start recovery from backup, everything would 
> be for sure recovered as it was yesterday after work?
> 
> 
> the answer?

The answer using hammer: use mirror-stream and have your data onto another
disk, connected to a different host with a state "as of" 1 minute ago in the
worst case.
Dead hardware ? Just swap them, switch slave pfs to master, and you're done.

> THE ANSWER:
> ---------------------------------------
> 1) divide disk space for metadata space and data space. amount of 
> metadata space defined at filesystem creation, say 3% of whole drive.

And then you're into the "gosh! i never thought it'd store so many small files!
i'm screwed".

> 2) data stored only in B-Tree leafs, and all B-Tree leafs stored in 
> "metadata space". few critical filesystem blocks stored here too at 
> predefined place.
> 
> 3) everything else stored in data space. B-Tree blocks excluding leafs, 
> undo log, actual data.
> 
> 
> 4) everything else as it is already with modification to make sure every 
> B-Tree leaf block will have data describing it properly. inodes having 
> inode number inside, directory having it's inode number inside too. AFAIK 
> it is already like that.
> 
> 5) hammer recover modified to scan this 3% of space and then rebuild 
> B-Tree. Will work faster or similar than fsck_ffs this way, in spite of 
> being "last resort" tool.
> ---
> 
> THE RESULT: Fast and featureful filesystem that can always be quickly 
> recovered even in "last resort" cases.

I just don't follow what you meant, honestly. But well, show us the code if you
feel brave.

That'll be the last reply to this thread for me.

Good night,
-- 
Francis