<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Here the problem is that user can't control the buffer and that

      it is reset on reboots for example. In real use cases (mine at

      least) it prevents deduplication from having effect because it

      does not happen dut to empty buffer.<br>

    </p>

    <div class="moz-cite-prefix">09.11.2019 10:45, Matthew Dillon пишет:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAOZ7CpChJE8GkmCZuz2LUfxe2UEF3oNBjCRN50T8qXQqUO-X7g@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">H2's de-duplication only works on live data, there

        is no background/off-line de-dup operation.  This generally

        works when you do something like a 'cp' operation, that is when

        copying files or whole directory trees.   Basically if H2 sees

        data that is already in the buffer cache that matches new data

        being written to the filesystem, it will attempt to reference

        the existing block(s) instead of writing out new blocks with the

        same data.

        <div>

          <div>

            <div><br>

            </div>

            <div>-Matt</div>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Nov 7, 2019 at 2:07 PM

          Mikhail Novosyolov <<a

            href="mailto:m.novosyolov@rosalinux.ru"

            moz-do-not-send="true">m.novosyolov@rosalinux.ru</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

          <br>

          7 ноября 2019 г. 17:31:51 GMT+03:00, Justin Sherrill <<a

            href="mailto:justin@shiningsilence.com" target="_blank"

            moz-do-not-send="true">justin@shiningsilence.com</a>>

          пишет:<br>

          >Are you looking for how it works technically, or how to

          operate it?<br>

          <br>

          Both.<br>

          <br>

          >The design document goes into the technical details of how

          it's set<br>

          >up:<br>

          ><br>

          ><a

href="http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN"

            rel="noreferrer" target="_blank" moz-do-not-send="true">http://gitweb.dragonflybsd.org/dragonfly.git/blob_plain/HEAD:/sys/vfs/hammer2/DESIGN</a><br>

          <br>

          Thanks. I forgot about this document. Found "dedup" and

          "de-dup" there. "Offline" deduplication is called "bulk"

          deduplication. It is stated that bulk dedup is possible but

          not implemented yet.<br>

          <br>

          >The design notes mention that, like you say, online dedup

          is on by<br>

          >default.  There's no utility to run it offline yet, though

          I don't<br>

          >know how much of a difference that makes in practice;<br>

          <br>

          When dedupping online, the decision where to write data is

          made before writing it. When dedupping offline (bulk), data is

          first written to the disk and then has to be either relocated

          or a part of it must be replaced with reference to the same

          data in another place, keeping minimal access overhead.<br>

          The author of btrfs offline dedupper explained his algorithm

          here: <a

            href="https://github.com/Zygo/bees/issues/116#issuecomment-549002363"

            rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/Zygo/bees/issues/116#issuecomment-549002363</a><br>

          <br>

          Does Hammer2 design somehow mitigate or simplify bulk

          deduplication?<br>

          <br>

          I don't like that there is online dedup only because due to

          different circumstances and low available RAM not all files

          may be deduplocated in time. Also, I don't know what is stored

          in RAM, is hash of a specific file there for dedupicating its

          copies?<br>

          <br>

          Is there a way to see already made deduplications on a

          existing filesystem?<br>

          <br>

          So, in general, for now online dedup in Hammer2 is out of my

          control and awereness, I even don't know if any files in my 1

          TB storage were deduplicated.<br>

          <br>

          ><br>

          >On Wed, Nov 6, 2019 at 2:25 PM Mikhail Novosyolov<br>

          ><<a href="mailto:m.novosyolov@rosalinux.ru"

            target="_blank" moz-do-not-send="true">m.novosyolov@rosalinux.ru</a>>

          wrote:<br>

          >><br>

          >> Hi,<br>

          >><br>

          >> I have not been able to find documentation about how

          deduplication<br>

          >works<br>

          >> in Hammer2. I have only found that online

          deduplication is now on by<br>

          >> default, but there seems to be no description how it

          works.<br>

          >><br>

          >> Is it possible to make offline deduplication? I mean

          finding blocks<br>

          >with<br>

          >> the same checksum and making one block point to

          another instead of<br>

          >> spending space on disk twice? There are several

          utilities that do<br>

          >this<br>

          >> on BTRFS in Linux, for example <a

            href="https://github.com/Zygo/bees" rel="noreferrer"

            target="_blank" moz-do-not-send="true">https://github.com/Zygo/bees</a><br>

          >><br>

          <br>

          -- <br>

          Простите за краткость, создано в K-9 Mail.<br>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>