Futures - HAMMER comparison testing?
Michael Neumann
mneumann at ntecs.de
Sat Jan 19 04:10:07 PST 2008
Matthew Dillon wrote:
:But - at the end of the day - how much [extra?] on-disk space will be
:needed to insure mount 'as-of' is 'good enough' for some realisitic span
:(a week?, a month?)? 'Forever' may be too much to ask.
The amount of disk needed is precisely the same as the amount of
historical data (different from current data) that must be retained,
plus record overhead.
So it comes down to how much space you are willing to eat up to store
the history, and what kind of granularity you will want for the history.
:How close are we to being able to start predicting that storage-space
:efficiency relative to ${some_other_fs}?
:
:Bill
Ultimately it will be extremely efficient simply by the fact that
there will be a balancer going through it and repacking it.
For the moment (and through the alpha release) it will be fairly
inefficient because it is using fixed 16K data records, even for small
files. The on-disk format doesn't care... records can reference
variable-length data from around 1MB down to 64 bytes. But supporting
variable-length data requires implementing some overwrite cases that
I don't want to do right now. This only applies to regular files
of course. Directories store directory entries as records, not as data,
so directories are packed really nicely.
e.g. if you have one record representing, say, 1MB of data, and you
write 64 bytes right smack in the middle of that, the write code will
have to take that one record, mark it as deleted, then create three
records to replace it (one pointing to the unchanged left portion of
the original data, one pointing to the 64 bytes of overwritten data,
and one pointing to the unchanged right portion of the original data).
The recovery and deletion code will also have to deal with that sort
of overlayed data situation. I'm not going to be writing that
feature for a bit. There are some quick hacks I can do too, for
small files, but its not on my list prior to the alpha release.
To me it seems that this makes compression somewhat more easy to
implement. Mainly because the old data is kept. So when you overwrite
compressed data, the system would only compress the new data and
introduce pointers to the left portion and right portion of the old
data. What might be complicated to handle is the differences in size of
compressed vs. uncompressed in all the buffers.
If this yields a compression ratio of 2x, this could be extremely useful
especially due to the historic nature of HAMMER (deleted files take less
memory).
Even better if there would be a flag to open(2) to retrieve the file in
raw format (i.e. compressed in case it is stored compressed).
Think about web-servers! Most of them send static files in compressed
form if the client accepts it. That'd be a huge benefit for serving
static pages. Also think about sendfile(2) which now could send
compressed data directly. This would make it much more usable in a
webserver and would avoid to store a compressed copy of the file.
The same infrastructure could also be used to implement file-oriented
encryption. Basically every algorithm that works on a stream of data
would be possible. For encryption, a special system call could
associate a key to an open file descriptor, or a default key to use
for the whole process.
Regards,
Michael
More information about the Users
mailing list