[GSOC] HAMMER2 compression feature week11 report

Petr Janda elekktretterr at exemail.com.au
Sun Sep 1 22:54:15 PDT 2013


Why can't LZ4 compress text files?

Petr

On 2/09/2013 2:51 PM, Daniel Flores wrote:
> Hello,
> here is my report for week 11.
> 
> As we are approaching the end of GSOC, the code of added feature seems
> to become more stable and less likely to change. So, in this report, as
> I said in the previous report, I present you the performance comparison
> between different compression modes. It is unlikely that there will be
> any new algorithm added in the course of the following weeks – most
> likely, I'll focus on polishing the current code and bug-hunting, and
> also performing some additional tests.
> 
> Some changes were made in the code: the most import of those is that we
> have one compression function for both compression algorithms now, which
> makes the code more compact. Also the code related to ZLIB is much more
> compact as well and basically only contains what we need, even though
> some additional cleaning is still needed. Also in the current code the
> compression is never turned off completely when an incompressible file
> is detected – now it retries the compression every 512KB of a file.
> 
> Now let's move on to performance results. It should be mentioned that
> the strength of ZLIB compression can be adjusted and I tested
> specifically levels 6 (default) and 9 (best compression ratio).
> Currently the user can't tune this compression level value.
> 
> First of all, I decided to determine what is the performance gain from
> using ZLIB instead of LZ4, so I looked into how many blocks ended as
> compressed and what was the actual size of those blocks in different
> kind of files. Those tests were performed in the actual file system, not
> in a prototype application, to ensure that the results would be like in
> a real-life use.
> 
> I tested a .jpg image, two .wav files, a couple of text files, a
> perfectly compressible .tif image and a couple of log files.
> 
> For the JPEG image and .wav files ZLIB compression turned unsuccessful,
> which is not surprising. However, ZLIB compression turned to be very
> effective with text files that LZ4 can't compress at all – it
> effectively managed to reduce all blocks of those files from 64KB to
> 32KB. Very nice result here. No difference between ZLIB level 6 and ZLIB
> level 9 though.
> 
> When it comes to .tif file both LZ4 and ZLIB managed to compress all the
> blocks, however many blocks compressed by ZLIB ended up smaller than
> blocks compressed by LZ4.
> In case of this particular file the result was the following:
> 
> Total number of blocks – 57
> 
> LZ4:
> 
> 1KB – 1
> 2KB – 1
> 4KB – 1
> 8KB – 7
> 16KB – 43
> 32KB – 4
> 
> ZLIB level 6:
> 
> 1KB – 2
> 2KB – 1
> 4KB – 13
> 8KB – 36
> 16KB – 5
> 
> ZLIB level 9:
> 
> 1KB – 2
> 2KB – 5
> 4KB – 10
> 8KB – 38
> 16KB – 2
> 
> As you can see, not only ZLIB compression gave a better result, but
> there is also a significant difference between ZLIB level 6 and ZLIB
> level 9. Something similar seems to happen with log files where ZLIB
> presents better results than LZ4 and, apparently, there is also a
> difference between ZLIB level 6 and ZLIB level 9, even though for now I
> can't present the exact numbers.
> 
> Now, let's move to another side of performance, which is time of
> execution. I measured the total elapsed time for cp (write performance)
> and diff (read performance) commands using a time utility and scripts
> that copied (from HAMMER to HAMMER2) or diff'ed the specified files
> (diff'ed originals on HAMMER with copies on HAMMER2). Each script was
> executed 10 times and the HAMMER2 partition was remounted between each
> execution.
> 
> You can see the results for write performance here [1] and for read
> performance here [2].
> Both LZ4 and ZLIB are usable, however the usage of ZLIB may have a
> significant impact on performance and the user must be aware of it. The
> difference between LZ4 and ZLIB isn't that big in case of incompressible
> files (.jpg and .tar.gz) and small compressible files (.tif and .txt),
> but it's huge in case of big compressible files (logs). Very
> interestingly, it does look like the strongest compression level for
> ZLIB (level 9) is actually either no different or even slightly faster
> overall than the default compression level. Also reading of files
> compressed with ZLIB seems to be slightly faster than reading of files
> compressed with LZ4.
> 
> There is no difference between writing/reading small files without
> compression and with compression of any type, but in case of bigger
> files the writing without compression is slightly faster than writing
> with LZ4, while in case of reading the compressed files almost always
> offer advantage in speed, the ZLIB level 9 being a winner.
> 
> It should be noted that all those tests were performed on a virtual
> machine, which means that on a real hardware the performance would be
> better. I also must note that this is not the comparison between
> algorithms themselves, but only between their behavior in very specific
> circumstances, that is, HAMMER2 file system, where they can't perform at
> their full speed or compression ratio.
> 
> Next week I hope to present some tests of files that can't fit directly
> into compressible/incompressible category and also the performance test
> of zero-checking feature. I also hope to improve the stability of code
> and present the results of some stress-test.
> 
> You can check out the files used for time performance tests (except the
> log file for security reasons, please tell me if you need to take a look
> at it): 1.jpg[3], every.wav[4], mike.wav[5], book1[6], frymire.tif[7].
> You can check out all the current code in my repository [8], branch
> “hammer2_compression”.
> 
> I'll appreciate all comments, suggestions and criticism.
> 
> 
> Daniel
> 
> [1] http://leaf.dragonflybsd.org/~iostream/write_performance.html
> [2] http://leaf.dragonflybsd.org/~iostream/read_performance.html
> [3] http://leaf.dragonflybsd.org/~iostream/1.jpg
> [4] http://leaf.dragonflybsd.org/~iostream/every.wav
> [5] http://leaf.dragonflybsd.org/~iostream/mike.wav
> [6] http://leaf.dragonflybsd.org/~iostream/book1
> [7] http://leaf.dragonflybsd.org/~iostream/frymire.tif
> [8] git://leaf.dragonflybsd.org/~iostream/dragonfly.git
> <http://leaf.dragonflybsd.org/~iostream/dragonfly.git>


-- 
Please use PGP to encrypt your email to ensure our privacy is respected.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 553 bytes
Desc: OpenPGP digital signature
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20130902/f54e7d7b/attachment-0013.bin>


More information about the Kernel mailing list