<div dir="ltr"><div>Hello everyone,<br></div><div>here is my report for 6th week.</div><div><br></div><div>During this week a couple of significant changes happened.</div><div><br></div><div>First of them is that I had to return to using intermediary buffer in read path because of certain tricky bug that corrupted the decompressed result. So now it's not performing decompression directly from physical buffer to logical, but, once again, first it decompresses the data into an intermediary buffer from which it is copied then to logical buffer.</div>

<div><br></div><div>Second is that now I don't allocate those intermediary buffers right before using them. Matthew Dillon suggested me to use a special data structure called objcache which handles the allocations automatically and now I use this structure to obtain the buffers when I need them without performing the allocation every time when a block of data has to be compressed or decompressed. The objcache itself is being created/destroyed when a HAMMER2 module is loaded/unloaded, so the overhead is extremely small, even though the structure is being created all the time, even in case that user doesn't use the compression feature.</div>

<div><br></div><div>The result is that even though I still use intermediary buffers the performance improved a lot. Since I promised in my previous report that I'll provide some numbers, here they are. I provide them even though there is a room for improvement and I'll be working on optimizing the objcache parameters as well as some other things. Still I think it's interesting to take a quick look at them just to have an approximation of how it is right now.</div>

<div><br></div><div>It should be noted, though, that those results don't measure the reading/writing performance directly and, also, that they were obtained on a virtual machine, which means that on real hardware the performance would be better. Also note that the same disk is shared between a VM and a host OS, so this also introduces uncertainty in those numbers.</div>

<div><br></div><div>Now, on the test description... I used 5 test cases for both write and read path.</div><div><br></div><div>Case 1 is a small file that can't be compressed. It's a 2.2 MB JPEG image. </div><div>

Case 2 is a big file that can't be compressed. It's a 77.2 MB video file.</div><div>Case 3 is a small file that compresses perfectly. It's a 3.5 MB TIFF image.</div><div>Case 4 is a big file that compresses perfectly. It's a 47.5MB log file.</div>

<div>Case 5 is, finally, a bunch of files some of which do not compress, some compress partially, and some compress perfectly. There are 35 files that have total size of 184.9 MB and they also contain the files used in previous cases.</div>

<div><br></div><div>For each case I copied the file/files with cp command from HAMMER partition to HAMMER2 partition in 2 different folders, one without compression and other with LZ4 compression. I did this 10 times for each folder measuring time with the “time” utility (total elapsed time). Then I compared the files from HAMMER2 partition with the originals on HAMMER partition using the diff command, again 10 times and using “time” utility to measure the time it takes. I also remounted the HAMMER2 partition between each diff to insure that there wouldn't be any caching that would affect the results.</div>

<div><br></div><div>So, in case of cp we have, roughly, the time it takes to read from HAMMER partition + the time it takes to write to HAMMER2 partition and in case of diff we have time it takes to read from HAMMER partition + time it takes to read from HAMMER2 partition + whatever time it takes to compare files.</div>

<div><br></div><div>The difference which is important for us is the difference between the time spent on HAMMER2 part without compression and time spent on HAMMER2 part with compression. Let's see the difference.</div>

<div><br></div><div>You can see the summarized results in this table [1].</div><div><br></div><div>In the case of files that can't be compressed, we can see that the write time is slower when the compression is turned on. This happens because even though the file can't be compressed, the file system tries to compress each block, fails and thus wastes some time on that. In order to address this issue, Matthew Dillon suggested to try to detect a file that can't be compressed and not compress it (detection is done by counting the number of contiguous blocks for which the compression failed, so the compression wouldn't be tried again after a certain number is reached). With this possible improvement, it's possible that the difference between time spent on write paths with and without compression wouldn't be perceivable. I'll try to implement this improvement later.</div>

<div><br></div><div>On the other hand, there is no significant difference when it comes to read time. This happens, probably, because the read path with compression just checks whether or not the specific block is compressed and only tries to decompress it if it is compressed. So there shouldn't be much difference.</div>

<div><br></div><div>The case of files that can be perfectly compressed is interesting. When it comes to write time it's not different from previous case, the only difference is that it doesn't fail during compression now, but it's still slower than write path without compression.</div>

<div><br></div><div>The read path with compression seems to be actually a bit faster than read path without compression in this particular case. This happens, probably, because the hard drive needs to read less data than in case without compression. Since in our case the compression is successful only when a block is compressed to 50% of its size or less, if all block are compressed, the resulting size of a file is significantly smaller. Also, the LZ4 decompression algorithm is so fast, that it doesn't affect much the overall time.</div>

<div><br></div><div>Finally, the case of group of files seems to follow the trend. In real world the performance will depend on the type of files. </div><div><br></div><div>In conclusion, for now we can be sure that regardless of whether files can be compressed or not, the write time will be slower in case of write path with compression than in case of write path without it. On the other hand, the read path with compression will, probably, be actually a bit faster in case that the files were compressed and will have the same speed as read path without decompression if they weren't. It looks like it's possible to optimize the write path with compression, so that it won't be significantly slower than without it in case that the file can't be compressed.</div>

<div><br></div><div>The remaining part of the weekend and the next week I'll be working on several things. Even though I said previously that I implemented zero-checking, in fact it's not implemented correctly and I'll work on it again. Also, the code overall needs to be cleaned up, because it's extremely messy right now and there surely are many things to optimize, for example, the parameters of objcache and the write path with compression. I also need to continue with bug-hunting...</div>

<div><br></div><div>I'll appreciate any comments, suggestions and criticism. All the code is available in my repository, branch “hammer2_LZ4” [2].</div><div><br></div><div><br></div><div>Daniel</div><div><br></div><div>

[1] <a href="http://leaf.dragonflybsd.org/~iostream/performance_table.html">http://leaf.dragonflybsd.org/~iostream/performance_table.html</a></div><div>[2] git://<a href="http://leaf.dragonflybsd.org/~iostream/dragonfly.git">leaf.dragonflybsd.org/~iostream/dragonfly.git</a></div>

</div>