Futures - HAMMER comparison testing?

Sat Jan 19 04:10:07 PST 2008

Matthew Dillon wrote:
:But - at the end of the day - how much [extra?] on-disk space will be 
:needed to insure mount 'as-of' is 'good enough' for some realisitic span 
:(a week?, a month?)? 'Forever' may be too much to ask.

    The amount of disk needed is precisely the same as the amount of
    historical data (different from current data) that must be retained,
    plus record overhead.
    So it comes down to how much space you are willing to eat up to store
    the history, and what kind of granularity you will want for the history.
:How close are we to being able to start predicting that storage-space 
:efficiency relative to ${some_other_fs}?
:
:Bill

    Ultimately it will be extremely efficient simply by the fact that
    there will be a balancer going through it and repacking it.
    For the moment (and through the alpha release) it will be fairly
    inefficient because it is using fixed 16K data records, even for small
    files.  The on-disk format doesn't care... records can reference 
    variable-length data from around 1MB down to 64 bytes.  But supporting
    variable-length data requires implementing some overwrite cases that
    I don't want to do right now.  This only applies to regular files
    of course.  Directories store directory entries as records, not as data,
    so directories are packed really nicely. 

    e.g. if you have one record representing, say, 1MB of data, and you
    write 64 bytes right smack in the middle of that, the write code will
    have to take that one record, mark it as deleted, then create three
    records to replace it (one pointing to the unchanged left portion of
    the original data, one pointing to the 64 bytes of overwritten data,
    and one pointing to the unchanged right portion of the original data).
    The recovery and deletion code will also have to deal with that sort
    of overlayed data situation.  I'm not going to be writing that
    feature for a bit.  There are some quick hacks I can do too, for
    small files, but its not on my list prior to the alpha release.
To me it seems that this makes compression somewhat more easy to
implement. Mainly because the old data is kept. So when you overwrite
compressed data, the system would only compress the new data and
introduce pointers to the left portion and right portion of the old 
data. What might be complicated to handle is the differences in size of
compressed vs. uncompressed in all the buffers.

If this yields a compression ratio of 2x, this could be extremely useful
especially due to the historic nature of HAMMER (deleted files take less 
memory).

Even better if there would be a flag to open(2) to retrieve the file in
raw format (i.e. compressed in case it is stored compressed).
Think about web-servers! Most of them send static files in compressed
form if the client accepts it. That'd be a huge benefit for serving
static pages. Also think about sendfile(2) which now could send
compressed data directly. This would make it much more usable in a
webserver and would avoid to store a compressed copy of the file.
The same infrastructure could also be used to implement file-oriented
encryption. Basically every algorithm that works on a stream of data
would be possible. For encryption, a special system call could
associate a key to an open file descriptor, or a default key to use
for the whole process.

Regards,

  Michael