[GSoC] HAMMER compression and new unionfs

Michael Neumann mneumann at ntecs.de
Tue Mar 29 01:47:23 PDT 2011

Am Dienstag, den 29.03.2011, 00:48 +0900 schrieb Naohiro Aota:
> Hi,
> I'm Naohiro Aota, undergraduate student at Osaka University, Japan.
> Last year I've participated GSoC with Gentoo and worked on porting
> Gentoo system to DragonFly. Since then I'm so interested in DragnFly
> kernel, so I'd like to take part in GSoC with some DragnFly kernel work
> this year. I've read the project page and get interested these two
> ideas: HAMMER compression and new unionfs. (yes, I like filesystem ;))
> I have some question about the ideas.
> about HAMMER compression:
> - "compression could be turned on a per-file" may support all files
>   under "/foo" get compressed?

Individual blocks of data will be compressed, so that it could happen
that a file contains uncompressed and compressed data blocks. You only
have to record a flag whether a given block is compressed (or not) and
uncompress/compress it transparently before passing it to/from the
buffer cache. The decision whether to compress a block when writing a
file can be many-fold: Either a filesystem-wide flag (all files created
within this filesystem will by default be compressed), a recursivly
inherited per-directory flag (a new file that gets created inside this
directory will be compressed), or what is also feasible is that the
compression is done by the reblocker, i.e. as a background process, so
that you will never directly write compressed data "online" (this could
be a starting point).

As we keep historical data for a longer period of time (this is how
HAMMER works and we like it), compression could increase the amount of
historical data that we can store. As most of the historical data is
only very infrequently accessed (they mainly serve as backup), the
decompression must not be hyper-performant (IMHO), but of course an
acceptable performance is desirable (due to slow disk reads, compression
could even lead to faster access).

> - file size measurement commands, such as "df", "du" and "ls", also need
>   to change? (actual disk space size and file size may differ if compressed)

I think is will be enough to display the uncompressed file size, not the
compressed one, so no changes should be required. Note that we also have
deduplication and that "du" and "ls" will not show IMHO the actual disk
space used.



More information about the Kernel mailing list