Hammer2 offline deduplication

Mikhail Novosyolov m.novosyolov at rosalinux.ru
Thu Nov 7 14:07:42 PST 2019



7 ноября 2019 г. 17:31:51 GMT+03:00, Justin Sherrill <justin at shiningsilence.com> пишет:
>Are you looking for how it works technically, or how to operate it?

Both.

>The design document goes into the technical details of how it's set
>up:
>
>http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN

Thanks. I forgot about this document. Found "dedup" and "de-dup" there. "Offline" deduplication is called "bulk" deduplication. It is stated that bulk dedup is possible but not implemented yet.

>The design notes mention that, like you say, online dedup is on by
>default.  There's no utility to run it offline yet, though I don't
>know how much of a difference that makes in practice;

When dedupping online, the decision where to write data is made before writing it. When dedupping offline (bulk), data is first written to the disk and then has to be either relocated or a part of it must be replaced with reference to the same data in another place, keeping minimal access overhead.
The author of btrfs offline dedupper explained his algorithm here: https://github.com/Zygo/bees/issues/116#issuecomment-549002363

Does Hammer2 design somehow mitigate or simplify bulk deduplication?

I don't like that there is online dedup only because due to different circumstances and low available RAM not all files may be deduplocated in time. Also, I don't know what is stored in RAM, is hash of a specific file there for dedupicating its copies?

Is there a way to see already made deduplications on a existing filesystem?

So, in general, for now online dedup in Hammer2 is out of my control and awereness, I even don't know if any files in my 1 TB storage were deduplicated.

>
>On Wed, Nov 6, 2019 at 2:25 PM Mikhail Novosyolov
><m.novosyolov at rosalinux.ru> wrote:
>>
>> Hi,
>>
>> I have not been able to find documentation about how deduplication
>works
>> in Hammer2. I have only found that online deduplication is now on by
>> default, but there seems to be no description how it works.
>>
>> Is it possible to make offline deduplication? I mean finding blocks
>with
>> the same checksum and making one block point to another instead of
>> spending space on disk twice? There are several utilities that do
>this
>> on BTRFS in Linux, for example https://github.com/Zygo/bees
>>

-- 
Простите за краткость, создано в K-9 Mail.



More information about the Hammer mailing list