Hammer deduplication needs for RAM size
Tomas Bodzar
tomas.bodzar at gmail.com
Sat Apr 23 10:06:23 PDT 2011
On Fri, Apr 22, 2011 at 10:12 PM, Matthew Dillon
<dillon at apollo.backplane.com> wrote:
>
> :Hi all,
> :
> :can someone compare/describe need of RAM size by deduplication in
> :Hammer? There's something interesting about deduplication in ZFS
> :http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html
> :
> :Thx
>
> The ram is basically needed to store matching CRCs. The on-line dedup
> uses a limited fixed-sized hash table to remember CRCs, designed to
> match recently read data with future written data (e.g. 'cp').
>
> The off-line dedup (when you run 'hammer dedup ...' or
> 'hammer dedup-simulate ...' will keep track of ALL data CRCs when
> it scans the filesystem B-Tree. It will happily use lots of swap
> space if it comes down to it, which is probably a bug. But that's
> how it works now.
>
> Actual file data is not persistently cached in memory. It is read only
> when the dedup locates a potential match and sticks around in a limited
> cache before getting thrown away, and will be re-read as needed.
Their discussion continues and they talk about rule 1 - 3GB of RAM per
1TB of data. Regarding this
http://blogs.sun.com/roch/entry/dedup_performance_considerations1 it
looks like those data are persistent as cache in memory. So is this a
reason for higher RAM usage with ZFS and dedup when comparing with
Hammer?
>
> -Matt
> Matthew Dillon
> <dillon at backplane.com>
>
More information about the Users
mailing list