Hammer2 offline deduplication

Mikhail Novosyolov m.novosyolov at rosalinux.ru
Sat Nov 9 08:56:11 PST 2019


Here the problem is that user can't control the buffer and that it is 
reset on reboots for example. In real use cases (mine at least) it 
prevents deduplication from having effect because it does not happen dut 
to empty buffer.

09.11.2019 10:45, Matthew Dillon пишет:
> H2's de-duplication only works on live data, there is no 
> background/off-line de-dup operation.  This generally works when you 
> do something like a 'cp' operation, that is when copying files or 
> whole directory trees.   Basically if H2 sees data that is already in 
> the buffer cache that matches new data being written to the 
> filesystem, it will attempt to reference the existing block(s) instead 
> of writing out new blocks with the same data.
>
> -Matt
>
> On Thu, Nov 7, 2019 at 2:07 PM Mikhail Novosyolov 
> <m.novosyolov at rosalinux.ru <mailto:m.novosyolov at rosalinux.ru>> wrote:
>
>
>
>     7 ноября 2019 г. 17:31:51 GMT+03:00, Justin Sherrill
>     <justin at shiningsilence.com <mailto:justin at shiningsilence.com>> пишет:
>     >Are you looking for how it works technically, or how to operate it?
>
>     Both.
>
>     >The design document goes into the technical details of how it's set
>     >up:
>     >
>     >http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN
>
>     Thanks. I forgot about this document. Found "dedup" and "de-dup"
>     there. "Offline" deduplication is called "bulk" deduplication. It
>     is stated that bulk dedup is possible but not implemented yet.
>
>     >The design notes mention that, like you say, online dedup is on by
>     >default.  There's no utility to run it offline yet, though I don't
>     >know how much of a difference that makes in practice;
>
>     When dedupping online, the decision where to write data is made
>     before writing it. When dedupping offline (bulk), data is first
>     written to the disk and then has to be either relocated or a part
>     of it must be replaced with reference to the same data in another
>     place, keeping minimal access overhead.
>     The author of btrfs offline dedupper explained his algorithm here:
>     https://github.com/Zygo/bees/issues/116#issuecomment-549002363
>
>     Does Hammer2 design somehow mitigate or simplify bulk deduplication?
>
>     I don't like that there is online dedup only because due to
>     different circumstances and low available RAM not all files may be
>     deduplocated in time. Also, I don't know what is stored in RAM, is
>     hash of a specific file there for dedupicating its copies?
>
>     Is there a way to see already made deduplications on a existing
>     filesystem?
>
>     So, in general, for now online dedup in Hammer2 is out of my
>     control and awereness, I even don't know if any files in my 1 TB
>     storage were deduplicated.
>
>     >
>     >On Wed, Nov 6, 2019 at 2:25 PM Mikhail Novosyolov
>     ><m.novosyolov at rosalinux.ru <mailto:m.novosyolov at rosalinux.ru>> wrote:
>     >>
>     >> Hi,
>     >>
>     >> I have not been able to find documentation about how deduplication
>     >works
>     >> in Hammer2. I have only found that online deduplication is now
>     on by
>     >> default, but there seems to be no description how it works.
>     >>
>     >> Is it possible to make offline deduplication? I mean finding blocks
>     >with
>     >> the same checksum and making one block point to another instead of
>     >> spending space on disk twice? There are several utilities that do
>     >this
>     >> on BTRFS in Linux, for example https://github.com/Zygo/bees
>     >>
>
>     -- 
>     Простите за краткость, создано в K-9 Mail.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/hammer/attachments/20191109/76190fd4/attachment-0003.htm>


More information about the Hammer mailing list