[GSOC] HAMMER2 compression feature week10 report

Daniel Flores daniel5555 at gmail.com
Sun Aug 25 11:46:57 PDT 2013


Hello everyone,

here is my report for week 10.

This week I was doing two things that I mentioned in my previous report: I
tried to perform a couple of stress-tests and I started to add a new
algorithm as an alternative to LZ4.

Regarding the tests, I tried Blogbench and, sadly, at this point it results
in a crash, but I'll work on this during next week and, hopefully, it will
be solved soon. I expected that there will be some new bugs discovered and,
apparently, that's what happened.

As about the new algorithm: initially when I started this project I
considered several alternatives to LZ4 that had similar characteristics,
however, Matthew suggested me to change the whole approach and as the
result instead of using a fast algorithm, I decided to try to add a slow
algorithm that gives better compression rate.

The reasoning behind this is that if I added another algorithm similar to
LZ4 I wouldn't have added anything new really, because the difference,
probably, wouldn't be perceivable at all. Basically it would be another
fast algorithm that would compress well logs, source code files and some
types of uncompressed files, but we already have all of this... Instead
what we decided is to try a different type of algorithm, like gzip or xz,
that would be slower (but, most likely, usable given the performance of
modern CPUs), but would compress files like plain texts (where LZ4 isn't
performing well mostly) or more types of uncompressed files.

So, I was deciding between gzip and xz... At the end I decided to try gzip,
because even though xz has better compression ratio, it is also
significantly slower and my concern was that it wouldn't be usable. So,
instead, I decided to go with gzip, because it generally gives a very good
compression ratio and it is significantly faster than xz.

So, right now this alternative algorithm is added in the existing write
path and it looks like it works correctly. To activate it, option "3" was
added to hammer2 "setcomp" command. At this point I can't provide results
in numerical form, but it does feel slower than LZ4, still it seems to be
usable as well and it indeed works much better with files like plain texts
(such as mail list archives).

I used ZLIB library which proved to be very comfortable to use, but just
like it was with LZ4, it contains a lot of functions that we don't really
need... Also the current code is very dirty and inefficient, because, first
of all, I wanted it to work, but now I'll be improving it and trying to
make it as efficient as possible. For example, there are 2 different
functions that perform compression in write path – one for LZ4 and another
for ZLIB, but they are almost identical, so, most likely, they will be
unified, among other changes.

You can check out my current code in a new branch I started in my
repository [1] called “hammer2_compression” that includes both LZ4 and ZLIB
compression paths. The “hammer2_LZ4” branch will contain just LZ4 code, but
I think that there will be certain modifications too (which will be merged
into “hammer2_compression”).

In the next week's report I hope that I'll be able to provide performance
figures, like a comparison between no compression, LZ4 compression and ZLIB
compression. I'll also be working on improving the code quality and fixing
all found bugs.

I'll appreciate any comments, suggestions and criticism.

Thank you.


Daniel

[1] git://leaf.dragonflybsd.org/~iostream/dragonfly.git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130825/aa76cad9/attachment-0014.html>


More information about the Kernel mailing list