Real World DragonFlyBSD Hammer DeDup figures from HiFX - Reclaiming more than 1/4th ( 30% ) Disk Space from an Almost Full Drive
Venkatesh Srinivas
me at endeavour.zapto.org
Fri Jul 22 12:51:57 PDT 2011
> The memory use can be bounded with some additional work on the software,
> if someone wants to have a go at it. Basically the way you limit memory
> use is by dynamically limiting the CRC range that you observe in a pass.
> As you reach a self-imposed memory limit you reduce the CRC range and
> throw away out-of-range records. Once the pass is done you start a new
> pass with the remaining range. Rinse, repeat until the whole thing is
> done.
>
> That would make it possible to run de-dup with bounded memory. However,
> the extra I/O's required to verify duplicate data cannot be avoided.
Currently the dedup code (in sbin/hammer/cmd_dedup.c) kicks off in
hammer_cmd_dedup(); scan_pfs() calls process_btree_elm() for every
data record in the B-Tree. There is an RB tree constructed of data
records, keyed on their CRCs. process_btree_elm() has an easy job --
for every new record, it checks for a matching CRC in the tree; if it
finds one, it attempts a dedup ioctl [the kernel performs a full block
comparison, don't worry].
There is a really straightforward way to dramatically reduce memory
use by dedup at a time cost -- run a fixed number of passes, each pass
only storing records in the RB tree where (CRC % numpass ==
current_pass). After each pass, clear out the CRC RB tree. This will
not run dedup in bounded space, but it is really straightforward to do
and can result in dramatic memory use reductions (16 passes should
reduce memory use by a factor of 16, for example).
I've done a crude patch to try it, only changing ~20 lines of code in
cmd_dedup.c. Something like this might work (very rough atm):
in hammer_cmd_dedup():
- scan_pfs(av[0], &process_btree_elm);
+
+ for (i = 0; i < npasses; i++)
+ scan_pfs(av[0], &process_btree_elm);
...
assert(RB_EMPTY(&dedup_tree));
...
+ passnum++;
+
+ } /* for npasses */
+
in process_btree_elm():
de = RB_LOOKUP(dedup_entry_rb_tree, &dedup_tree, scan_leaf->data_crc);
if (de == NULL) {
+ if (scan_leaf->data_crc % (passnum + 1))
+ goto end;
+
To run in bounded space is also possible, but would require a variable
number of passes. Imagine having a fixed number of CRC RB records you
are willing to create each pass, MAXCRC. In each pass, you should keep
accepting blocks with new CRCs into the RB tree until you've accepted
MAXCRC ones. Then you record the highest accepted CRC for that pass
and continue walking the disk, deduping blocks with matching CRCs but
not accepting new ones to the tree. On the next pass, you will accept
records with a CRC higher than the highest one you accepted on the
last pass, up to MAXCRC. Between each pass, you clear the CRC RB tree.
Example:
Lets say I can have two CRCs in my tree (I have an old computer) and
my FS has records with CRCs: [A B C A A B B D C C D D E].
On pass one, I'd store A and B in my RB tree as I see records. I'd
record B as the highest CRC I dedup-ed on pass one; then I'd finish my
disk walk and dedup all As and Bs. On pass two, I'd see a C and then
later a D. I'd keep dedup C and D blocks on this pass and record D. So
on and on till I've dedup-ed the highest CRC on disk.
This would be a pretty neat way to do dedup! But it would involve more
work than the fixed-numpasses approach. Either would be pretty good
projects for someone who wanted to get started
<strike>breaking</strike> working on DragonFly. There is very little
that can go wrong with dedup strategies -- the kernel validates all
data records before dedup-ing. In fact, a correct (if stupid) approach
that'd involve nearly no memory commitment would be to run dedup
ioctls for every data record with every other data record...
-- vs
More information about the Users
mailing list