[GSOC] HAMMER2 compression feature week11 report
Daniel Flores
daniel5555 at gmail.com
Sun Sep 1 21:51:47 PDT 2013
Hello,
here is my report for week 11.
As we are approaching the end of GSOC, the code of added feature seems to
become more stable and less likely to change. So, in this report, as I said
in the previous report, I present you the performance comparison between
different compression modes. It is unlikely that there will be any new
algorithm added in the course of the following weeks – most likely, I'll
focus on polishing the current code and bug-hunting, and also performing
some additional tests.
Some changes were made in the code: the most import of those is that we
have one compression function for both compression algorithms now, which
makes the code more compact. Also the code related to ZLIB is much more
compact as well and basically only contains what we need, even though some
additional cleaning is still needed. Also in the current code the
compression is never turned off completely when an incompressible file is
detected – now it retries the compression every 512KB of a file.
Now let's move on to performance results. It should be mentioned that the
strength of ZLIB compression can be adjusted and I tested specifically
levels 6 (default) and 9 (best compression ratio). Currently the user can't
tune this compression level value.
First of all, I decided to determine what is the performance gain from
using ZLIB instead of LZ4, so I looked into how many blocks ended as
compressed and what was the actual size of those blocks in different kind
of files. Those tests were performed in the actual file system, not in a
prototype application, to ensure that the results would be like in a
real-life use.
I tested a .jpg image, two .wav files, a couple of text files, a perfectly
compressible .tif image and a couple of log files.
For the JPEG image and .wav files ZLIB compression turned unsuccessful,
which is not surprising. However, ZLIB compression turned to be very
effective with text files that LZ4 can't compress at all – it effectively
managed to reduce all blocks of those files from 64KB to 32KB. Very nice
result here. No difference between ZLIB level 6 and ZLIB level 9 though.
When it comes to .tif file both LZ4 and ZLIB managed to compress all the
blocks, however many blocks compressed by ZLIB ended up smaller than blocks
compressed by LZ4.
In case of this particular file the result was the following:
Total number of blocks – 57
LZ4:
1KB – 1
2KB – 1
4KB – 1
8KB – 7
16KB – 43
32KB – 4
ZLIB level 6:
1KB – 2
2KB – 1
4KB – 13
8KB – 36
16KB – 5
ZLIB level 9:
1KB – 2
2KB – 5
4KB – 10
8KB – 38
16KB – 2
As you can see, not only ZLIB compression gave a better result, but there
is also a significant difference between ZLIB level 6 and ZLIB level 9.
Something similar seems to happen with log files where ZLIB presents better
results than LZ4 and, apparently, there is also a difference between ZLIB
level 6 and ZLIB level 9, even though for now I can't present the exact
numbers.
Now, let's move to another side of performance, which is time of execution.
I measured the total elapsed time for cp (write performance) and diff (read
performance) commands using a time utility and scripts that copied (from
HAMMER to HAMMER2) or diff'ed the specified files (diff'ed originals on
HAMMER with copies on HAMMER2). Each script was executed 10 times and the
HAMMER2 partition was remounted between each execution.
You can see the results for write performance here [1] and for read
performance here [2].
Both LZ4 and ZLIB are usable, however the usage of ZLIB may have a
significant impact on performance and the user must be aware of it. The
difference between LZ4 and ZLIB isn't that big in case of incompressible
files (.jpg and .tar.gz) and small compressible files (.tif and .txt), but
it's huge in case of big compressible files (logs). Very interestingly, it
does look like the strongest compression level for ZLIB (level 9) is
actually either no different or even slightly faster overall than the
default compression level. Also reading of files compressed with ZLIB seems
to be slightly faster than reading of files compressed with LZ4.
There is no difference between writing/reading small files without
compression and with compression of any type, but in case of bigger files
the writing without compression is slightly faster than writing with LZ4,
while in case of reading the compressed files almost always offer advantage
in speed, the ZLIB level 9 being a winner.
It should be noted that all those tests were performed on a virtual
machine, which means that on a real hardware the performance would be
better. I also must note that this is not the comparison between algorithms
themselves, but only between their behavior in very specific circumstances,
that is, HAMMER2 file system, where they can't perform at their full speed
or compression ratio.
Next week I hope to present some tests of files that can't fit directly
into compressible/incompressible category and also the performance test of
zero-checking feature. I also hope to improve the stability of code and
present the results of some stress-test.
You can check out the files used for time performance tests (except the log
file for security reasons, please tell me if you need to take a look at
it): 1.jpg[3], every.wav[4], mike.wav[5], book1[6], frymire.tif[7].
You can check out all the current code in my repository [8], branch
“hammer2_compression”.
I'll appreciate all comments, suggestions and criticism.
Daniel
[1] http://leaf.dragonflybsd.org/~iostream/write_performance.html
[2] http://leaf.dragonflybsd.org/~iostream/read_performance.html
[3] http://leaf.dragonflybsd.org/~iostream/1.jpg
[4] http://leaf.dragonflybsd.org/~iostream/every.wav
[5] http://leaf.dragonflybsd.org/~iostream/mike.wav
[6] http://leaf.dragonflybsd.org/~iostream/book1
[7] http://leaf.dragonflybsd.org/~iostream/frymire.tif
[8] git://leaf.dragonflybsd.org/~iostream/dragonfly.git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130902/0bbaa64e/attachment-0002.htm>
More information about the Kernel
mailing list