[GSOC] HAMMER2 compression feature week8 report

Sat Aug 10 16:46:18 PDT 2013

Hello everyone,
here is my report for week 8.

This week was dedicated mostly to bug-hunting, so I don't have more test
results to show you for now. The most important result of this week is
that, with the invaluable help from Matthew Dillon, all known bugs were
fixed. Right now I'm not getting any file corruption on read tests or any
other issue that happened in certain specific, but not unusual
circumstances. It should be mentioned though, that for now I've only tested
the file system with simple cp/diff tests, and there will be more realistic
and stressful tests in the near future,  but right now the code is much
more stable than it was before. Still, I expect some new bugs to be found
in the future tests.

As the result of bug-hunting, the write path changed once again a bit: now
we have hammer2_write_file(), hammer2_write_core(), hammer2_write_bp(),
hammer2_zero_check_and_write(), test_block_not_zeros(), zero_write() and
hammer2_compress_and_write().

hammer2_write_file() is the main function that gets to execute first in
write path. hammer2_write_core() is executed at the end of
hammer2_write_file() and its main function is to determine which route the
path will take depending on settings. hammer2_write_bp() is a function that
simply writes the logical buffer without performing any compression on it
and it's executed when no compression is set. If zero-checking is set,
hammer2_zero_check_and_write() is executed instead of it.
hammer2_zero_check_and_write() tests the block with test_block_not_zeros()
and executes hammer2_write_bp() if it's not a zero-filled block. If it is a
zero-filled block, zero_write() is executed instead. Finally,
hammer2_compress_and_write() corresponds to LZ4 compression path. It uses
test_block_not_zeros() and zero_write() too.

Aside from bug-hunting, there was also a couple of small optimizations
implemented in the write path. One of them is that now we try to detect
incompressible files. Basically, we count the number of contiguous blocks
that failed to compress. If the number reaches 8, then we turn off the
compression and don't try to compress a block anymore. The blocks have to
be contiguous, since if we successfully compress a block, then the counter
is set to 0. The reasoning behind this is that, basically, the types of
files that compress well, like source code and logs, mostly have all of
their blocks compressible from start to finish, so if the first blocks
failed to compress, then most likely it's not worth to try anymore. It's
unlikely that some file will have it's first 512KB incompressible, but the
rest compressible.

Another small improvement is that now we detect end of file and compress
only the bytes that actually contain some data instead of the whole logical
block.

Both optimizations are in early state and may be improved in the future.

My next step will be changing, once again, the write path in order to start
to use threads. I also will use more sophisticated and realistic tests to
ensure the stability of the feature.

My code is available, as usually, in my repository [1].

I'll appreciate any comments, suggestions and criticism. Thank you.

Daniel

[1] git://leaf.dragonflybsd.org/~iostream/dragonfly.git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130811/554d51d5/attachment-0002.htm>