Futures - HAMMER comparison testing?
Bill Hacker
wbh at conducive.org
Fri Jan 18 02:51:44 PST 2008
Matthew Dillon wrote:
:But - at the end of the day - how much [extra?] on-disk space will be
:needed to insure mount 'as-of' is 'good enough' for some realisitic span
:(a week?, a month?)? 'Forever' may be too much to ask.
The amount of disk needed is precisely the same as the amount of
historical data (different from current data) that must be retained,
plus record overhead.
So it comes down to how much space you are willing to eat up to store
the history, and what kind of granularity you will want for the history.
OK - so it WILL be a 'tunable', then.
FWIW - my yardsticks at the 'heavy' or most wasteful end are punch card
& paper/mylar tape on low/no RAM systems, where 'backup' is essentially
of 'infinite' granularity, moving through WORM storage to Plan9 Venti,
et al.
AFAIK, none of the oldest 'write once' methods are in even 'virtualized'
use - save possibly in the FAA or military fields, as few entities have
any prectical use for that sort of history.
At the other end, one of our projects involved storing the floor plans
of 60,000 buildings on RAID1. A technician manually rebuiding a failed
array mirrored empty HDD to full, and over 600 CD's had to be manually
reloaded.
In that case, there never had been risk of loss - anyone could buy the
latast CD's from the government lands department.
What his error cost us was 'only' time and inconvenience.
HAMMER cannot protect against all forms of human error - BUT - if it
inherently rebuilds more intelligently than the least-intelligent of
RAID1, it can greatly reduce the opportunity for that sort of 'accident'
to occur.
:How close are we to being able to start predicting that storage-space
:efficiency relative to ${some_other_fs}?
:
:Bill
Ultimately it will be extremely efficient simply by the fact that
there will be a balancer going through it and repacking it.
"... constantly, and in the background..." (I presume)
".. and with tunable frequency and priority." (I wish, eventually).
For the moment (and through the alpha release) it will be fairly
inefficient because it is using fixed 16K data records, even for small
files. The on-disk format doesn't care... records can reference
variable-length data from around 1MB down to 64 bytes. But supporting
variable-length data requires implementing some overwrite cases that
I don't want to do right now.
Is variable-length still likely to have a payback if the data records
were to be fixed at 512B or 1024B or integer multiples thereof?
> This only applies to regular files
of course. Directories store directory entries as records, not as data,
so directories are packed really nicely.
e.g. if you have one record representing, say, 1MB of data, and you
write 64 bytes right smack in the middle of that, the write code will
have to take that one record, mark it as deleted, then create three
records to replace it (one pointing to the unchanged left portion of
the original data, one pointing to the 64 bytes of overwritten data,
and one pointing to the unchanged right portion of the original data).
The recovery and deletion code will also have to deal with that sort
of overlayed data situation. I'm not going to be writing that
feature for a bit. There are some quick hacks I can do too, for
small files, but its not on my list prior to the alpha release.
Remember that HAMMER is designed for large filesystems which don't fill
up instantly. Consequently it will operate under the assumption that
it can take its time to recover free space. If one doesn't want to use
the history feature one can turn it off, of course, or use a very
granular retention policy.
My local backup system is currently using a 730GB UFS partition and it
is able to backup apollo, crater, and leaf with daily cpdups (using
the hardlink snapshot trick) going back about 3 months. In fact, I
can only fill up that 730GB about half way because fsck runs out of
memory and fails once you get over around 50 million inodes (mostly
dependant on the number of directories you have)... on UFS that is.
I found that out the hard way.
. .which reminds us what we will ALL soon face if we do NOT seek newer
solutions!
It takes almost a day for fsck to
recover the filesystem even half full. I'll be happy when I can throw
that old stuff away.
-Matt
Matthew Dillon
<dillon at backplane.com>
. . or just relegate it to what it still does faster/better. IF..
I hope and trust that DragonFly BSD will earn a place as a 'broad
spectrum' OS, competitive across the board with alternatives.
But - if not, or even just 'not at first'
- much as OpenBSD and NetBSD have long been seen as good choices for
routers and firewalls, DragonFly should be able to carve out a viable
niche as the better choice for centralized / clustered / shared-use
servers on the basis of:
- superior storage management
- cleaner kernel virtualization
- the very extensive code audit & cleanup that has been ongoing since
day one
Bill
, able to entrant.
But - if not
More information about the Users
mailing list