<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Here the problem is that user can't control the buffer and that
it is reset on reboots for example. In real use cases (mine at
least) it prevents deduplication from having effect because it
does not happen dut to empty buffer.<br>
</p>
<div class="moz-cite-prefix">09.11.2019 10:45, Matthew Dillon пишет:<br>
</div>
<blockquote type="cite"
cite="mid:CAOZ7CpChJE8GkmCZuz2LUfxe2UEF3oNBjCRN50T8qXQqUO-X7g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">H2's de-duplication only works on live data, there
is no background/off-line de-dup operation. This generally
works when you do something like a 'cp' operation, that is when
copying files or whole directory trees. Basically if H2 sees
data that is already in the buffer cache that matches new data
being written to the filesystem, it will attempt to reference
the existing block(s) instead of writing out new blocks with the
same data.
<div>
<div>
<div><br>
</div>
<div>-Matt</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Nov 7, 2019 at 2:07 PM
Mikhail Novosyolov <<a
href="mailto:m.novosyolov@rosalinux.ru"
moz-do-not-send="true">m.novosyolov@rosalinux.ru</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
7 ноября 2019 г. 17:31:51 GMT+03:00, Justin Sherrill <<a
href="mailto:justin@shiningsilence.com" target="_blank"
moz-do-not-send="true">justin@shiningsilence.com</a>>
пишет:<br>
>Are you looking for how it works technically, or how to
operate it?<br>
<br>
Both.<br>
<br>
>The design document goes into the technical details of how
it's set<br>
>up:<br>
><br>
><a
href="http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN</a><br>
<br>
Thanks. I forgot about this document. Found "dedup" and
"de-dup" there. "Offline" deduplication is called "bulk"
deduplication. It is stated that bulk dedup is possible but
not implemented yet.<br>
<br>
>The design notes mention that, like you say, online dedup
is on by<br>
>default. There's no utility to run it offline yet, though
I don't<br>
>know how much of a difference that makes in practice;<br>
<br>
When dedupping online, the decision where to write data is
made before writing it. When dedupping offline (bulk), data is
first written to the disk and then has to be either relocated
or a part of it must be replaced with reference to the same
data in another place, keeping minimal access overhead.<br>
The author of btrfs offline dedupper explained his algorithm
here: <a
href="https://github.com/Zygo/bees/issues/116#issuecomment-549002363"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/Zygo/bees/issues/116#issuecomment-549002363</a><br>
<br>
Does Hammer2 design somehow mitigate or simplify bulk
deduplication?<br>
<br>
I don't like that there is online dedup only because due to
different circumstances and low available RAM not all files
may be deduplocated in time. Also, I don't know what is stored
in RAM, is hash of a specific file there for dedupicating its
copies?<br>
<br>
Is there a way to see already made deduplications on a
existing filesystem?<br>
<br>
So, in general, for now online dedup in Hammer2 is out of my
control and awereness, I even don't know if any files in my 1
TB storage were deduplicated.<br>
<br>
><br>
>On Wed, Nov 6, 2019 at 2:25 PM Mikhail Novosyolov<br>
><<a href="mailto:m.novosyolov@rosalinux.ru"
target="_blank" moz-do-not-send="true">m.novosyolov@rosalinux.ru</a>>
wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> I have not been able to find documentation about how
deduplication<br>
>works<br>
>> in Hammer2. I have only found that online
deduplication is now on by<br>
>> default, but there seems to be no description how it
works.<br>
>><br>
>> Is it possible to make offline deduplication? I mean
finding blocks<br>
>with<br>
>> the same checksum and making one block point to
another instead of<br>
>> spending space on disk twice? There are several
utilities that do<br>
>this<br>
>> on BTRFS in Linux, for example <a
href="https://github.com/Zygo/bees" rel="noreferrer"
target="_blank" moz-do-not-send="true">https://github.com/Zygo/bees</a><br>
>><br>
<br>
-- <br>
Простите за краткость, создано в K-9 Mail.<br>
</blockquote>
</div>
</blockquote>
</body>
</html>