[GSOC] HAMMER2 compression feature week6 report

Sat Jul 27 19:42:01 PDT 2013

Hello everyone,
here is my report for 6th week.

During this week a couple of significant changes happened.

First of them is that I had to return to using intermediary buffer in read
path because of certain tricky bug that corrupted the decompressed result.
So now it's not performing decompression directly from physical buffer to
logical, but, once again, first it decompresses the data into an
intermediary buffer from which it is copied then to logical buffer.

Second is that now I don't allocate those intermediary buffers right before
using them. Matthew Dillon suggested me to use a special data structure
called objcache which handles the allocations automatically and now I use
this structure to obtain the buffers when I need them without performing
the allocation every time when a block of data has to be compressed or
decompressed. The objcache itself is being created/destroyed when a HAMMER2
module is loaded/unloaded, so the overhead is extremely small, even though
the structure is being created all the time, even in case that user doesn't
use the compression feature.

The result is that even though I still use intermediary buffers the
performance improved a lot. Since I promised in my previous report that
I'll provide some numbers, here they are. I provide them even though there
is a room for improvement and I'll be working on optimizing the objcache
parameters as well as some other things. Still I think it's interesting to
take a quick look at them just to have an approximation of how it is right
now.

It should be noted, though, that those results don't measure the
reading/writing performance directly and, also, that they were obtained on
a virtual machine, which means that on real hardware the performance would
be better. Also note that the same disk is shared between a VM and a host
OS, so this also introduces uncertainty in those numbers.

Now, on the test description... I used 5 test cases for both write and read
path.

Case 1 is a small file that can't be compressed. It's a 2.2 MB JPEG image.
Case 2 is a big file that can't be compressed. It's a 77.2 MB video file.
Case 3 is a small file that compresses perfectly. It's a 3.5 MB TIFF image.
Case 4 is a big file that compresses perfectly. It's a 47.5MB log file.
Case 5 is, finally, a bunch of files some of which do not compress, some
compress partially, and some compress perfectly. There are 35 files that
have total size of 184.9 MB and they also contain the files used in
previous cases.

For each case I copied the file/files with cp command from HAMMER partition
to HAMMER2 partition in 2 different folders, one without compression and
other with LZ4 compression. I did this 10 times for each folder measuring
time with the “time” utility (total elapsed time). Then I compared the
files from HAMMER2 partition with the originals on HAMMER partition using
the diff command, again 10 times and using “time” utility to measure the
time it takes. I also remounted the HAMMER2 partition between each diff to
insure that there wouldn't be any caching that would affect the results.

So, in case of cp we have, roughly, the time it takes to read from HAMMER
partition + the time it takes to write to HAMMER2 partition and in case of
diff we have time it takes to read from HAMMER partition + time it takes to
read from HAMMER2 partition + whatever time it takes to compare files.

The difference which is important for us is the difference between the time
spent on HAMMER2 part without compression and time spent on HAMMER2 part
with compression. Let's see the difference.

You can see the summarized results in this table [1].

In the case of files that can't be compressed, we can see that the write
time is slower when the compression is turned on. This happens because even
though the file can't be compressed, the file system tries to compress each
block, fails and thus wastes some time on that. In order to address this
issue, Matthew Dillon suggested to try to detect a file that can't be
compressed and not compress it (detection is done by counting the number of
contiguous blocks for which the compression failed, so the compression
wouldn't be tried again after a certain number is reached). With this
possible improvement, it's possible that the difference between time spent
on write paths with and without compression wouldn't be perceivable. I'll
try to implement this improvement later.

On the other hand, there is no significant difference when it comes to read
time. This happens, probably, because the read path with compression just
checks whether or not the specific block is compressed and only tries to
decompress it if it is compressed. So there shouldn't be much difference.

The case of files that can be perfectly compressed is interesting. When it
comes to write time it's not different from previous case, the only
difference is that it doesn't fail during compression now, but it's still
slower than write path without compression.

The read path with compression seems to be actually a bit faster than read
path without compression in this particular case. This happens, probably,
because the hard drive needs to read less data than in case without
compression. Since in our case the compression is successful only when a
block is compressed to 50% of its size or less, if all block are
compressed, the resulting size of a file is significantly smaller. Also,
the LZ4 decompression algorithm is so fast, that it doesn't affect much the
overall time.

Finally, the case of group of files seems to follow the trend. In real
world the performance will depend on the type of files.

In conclusion, for now we can be sure that regardless of whether files can
be compressed or not, the write time will be slower in case of write path
with compression than in case of write path without it. On the other hand,
the read path with compression will, probably, be actually a bit faster in
case that the files were compressed and will have the same speed as read
path without decompression if they weren't. It looks like it's possible to
optimize the write path with compression, so that it won't be significantly
slower than without it in case that the file can't be compressed.

The remaining part of the weekend and the next week I'll be working on
several things. Even though I said previously that I implemented
zero-checking, in fact it's not implemented correctly and I'll work on it
again. Also, the code overall needs to be cleaned up, because it's
extremely messy right now and there surely are many things to optimize, for
example, the parameters of objcache and the write path with compression. I
also need to continue with bug-hunting...

I'll appreciate any comments, suggestions and criticism. All the code is
available in my repository, branch “hammer2_LZ4” [2].

Daniel

[1] http://leaf.dragonflybsd.org/~iostream/performance_table.html
[2] git://leaf.dragonflybsd.org/~iostream/dragonfly.git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130728/bc6f68f7/attachment.html>