Real World DragonFlyBSD Hammer DeDup figures from HiFX - Reclaiming more than 1/4th ( 30% ) Disk Space from an Almost Full Drive

Dean Hamstead dean at fragfest.com.au
Tue Jul 19 05:51:24 PDT 2011


i would be intetested to see how this compares to other dedupliction implementations

D



On 19/07/2011, at 9:06 PM, Siju George <sgeorge.ml at gmail.com> wrote:

> Some Copy Paste mistakes in the first one. Hereis the updated one.
> 
> Hi,
> 
> Finally I got free after a long busy season to work on my DragonFlyBSD
> Backup Servers.
> One of the Backup Server has around 10 years of Company  Archives.
> 
> Short Sumary before dedup of firtst Hard Disk
> 
> Filesystem                Size   Used  Avail Capacity  Mounted on
> Backup1                   454G   451G   2.8G    99%    /Backup1
> 
> Short Sumary after dedup of firtst Hard Disk
> 
> Filesystem                Size   Used  Avail Capacity  Mounted on
> Backup1                   454G   313G   141G    69%    /Backup1
> 
> Reclaimed 138 GB i.e 30% of Disk space without deleting anything or
> considerably affecting the perfomance of the Server.
> 
> Full Story:
> 
> The first backups server was Debian Sarge, then Debian Etch and then
> OpenBSD with RAIDFRAME mirrors because it was the only Unix/Linux that
> would even detect the 120 GB hard disks we had back then.
> Later I turned to DragonFlyBSD due to HAMMER ( No fsck, No RAID Parity
> chceks and Easy FS Snapshots )
> So this Dragonfly backup server has around 10 years old backups of
> 
> 1) Web files of Projects ( html, php, images etc )
> 
> 2) SQL dumps both zipped and unzipped .Hammer snapshots gave me the
> luxury to do
> 
> http://www.dragonflybsd.org/docs/real_time_backup_server_for_microsoft_windows__44___linux__44___bsd_and_mac_os_x_clients/
> 
> But now we have SQL dumps of induvidual databses taken every hour and
> made available to the developers using snapshots in the same manner
> :-)
> 
> 3) MS Word, Excell Doc files - Company documents and User backups
> 
> 4) PSD files and such from Designers which takes a larg space.
> 
> 5) Git, SVN repositories backup
> 
> 6) Virtual Machine images ( mostly qcow2 )
> 
> 7) Configuration files of several servers and other details backuped
> daily/hourly os some times every 15 minutes and maintained with coarse
> grained snapshots without pruning.
> 
> 8) Several Softwares and CD ISO images
> 
> 9) Video/Audio files such as mp3,avi.flv,mpg and so on.
> 
> 
> The OS version currently is
> 
> DragonFly v2.11.0.247.gda17d9-DEVELOPMENT
> 
>  Processor is
> 
> AMD Athlon(tm) 64 Processor 3400+ (2193.63-MHz 686-class CPU)
> 
> Memory is
> 
> real memory  = 2113336320 (2015 MB)
> avail memory = 2029342720 (1935 MB)
> 
> with four 500GB SATA Disks mirroring PFS from each other and also from
> another Dragonfly Backup Server on a differrent floor using
> 'mirror-stream' started at boot using cron with an entry similar to
> 
> @reboot /sbin/hammer mirror-stream /Backup1/Data /Backup2/Data &
> 
> 
> I have never reinstalled the OS but kept following the development
> version from July 2009 so that is two years of rolling release which
> is a great advantage in itself :-)
> 
> The first Disk is mounted as /Backup1 and seems to be a good Candidate
> for dedup because it is almost full.
> 
> ======================================================================================
> Filesystem                Size   Used  Avail Capacity  Mounted on
> 
> Backup1                   454G   451G   2.8G    99%    /Backup1
> /Backup1/pfs/@@-1:00001   454G   451G   2.8G    99%    /Backup1/Data
> /Backup1/pfs/@@-1:00009   454G   451G   2.8G    99%    /Backup1/pkgsrc
> /Backup1/pfs/@@-1:00002   454G   451G   2.8G    99%    /Backup1/VersionControl
> /Backup1/pfs/@@-1:00003   454G   451G   2.8G    99%    /Backup1/test
> /Backup1/pfs/@@-1:00005   454G   451G   2.8G    99%
> /Backup1/www-5mbak/www-hot
> /Backup1/pfs/@@-1:00006   454G   451G   2.8G    99%
> /Backup1/mysql-1hbak/mysql-hot
> /Backup1/pfs/@@-1:00007   454G   451G   2.8G    99%
> /Backup1/project-docs-bak/project-docs
> =======================================================================================
> 
> Full Details below.
> 
> =========================================================
> 
>        Label               Backup1
>        No. Volumes         1
>        FSID                e182...............................................
>        HAMMER Version      4
> Big block information
>        Total           58140
>        Used            57713 (99.27%)
>        Reserved           69 (0.12%)
>        Free              358 (0.62%)
> Space information
>        No. Inodes   11350364
>        Total size       454G (487713669120 bytes)
>        Used             451G (99.27%)
>        Reserved         552M (0.12%)
>        Free             2.8G (0.62%)
> PFS information
>        PFS ID  Mode    Snaps  Mounted on
>             0  MASTER      0  /Backup1
>             1  MASTER      0  /Backup1/Data
>             2  MASTER      0  /Backup1/VersionControl
>             3  MASTER      0  /Backup1/test
>             5  MASTER      0  /Backup1/www-5mbak/www-hot
>             6  MASTER      0  /Backup1/mysql-1hbak/mysql-hot
>             7  MASTER      0  /Backup1/project-docs-bak/project-docs
>             9  MASTER      0  /Backup1/pkgsrc
> ==========================================================
> 
> 
> De Duping Steps Taken:
> ----------------------------------
> 
> 
> 1) Version Upgrading from 4 to 6.
> 
> =================================
> dfly-bkpsrv# hammer version-upgrade /Backup1 5
> hammer version-upgrade: succeeded
> dfly-bkpsrv# hammer version-upgrade /Backup1 6
> hammer version-upgrade: succeeded
> =================================
> 
> 2) Simulating using 'dedup-simulate' to get an idea.
> 
> =====================================================================================
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1
> Dedup-simulate /Backup1: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 0
> Dedup-simulate /Backup1 succeeded
> Simulated dedup ratio = 1.07
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/Data
> Dedup-simulate /Backup1/Data: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 1
> Dedup-simulate /Backup1/Data succeeded
> Simulated dedup ratio = 1.34
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/pkgsrc
> Dedup-simulate /Backup1/pkgsrc: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 9
> Dedup-simulate /Backup1/pkgsrc succeeded
> Simulated dedup ratio = 1.10
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/VersionControl
> Dedup-simulate /Backup1/VersionControl: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 2
> Dedup-simulate /Backup1/VersionControl succeeded
> Simulated dedup ratio = 2.79
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/test
> Dedup-simulate /Backup1/test: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 3
> Dedup-simulate /Backup1/test succeeded
> Simulated dedup ratio = 0.00
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/www-5mbak/www-hot
> Dedup-simulate /Backup1/www-5mbak/www-hot: objspace
> 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 5
> Dedup-simulate /Backup1/www-5mbak/www-hot succeeded
> Simulated dedup ratio = 1.39
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/mysql-1hbak/mysql-hot
> Dedup-simulate /Backup1/mysql-1hbak/mysql-hot: objspace
> 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 6
> Dedup-simulate /Backup1/mysql-1hbak/mysql-hot succeeded
> Simulated dedup ratio = 13.78
> 
> dfly-bkpsrv# hammer dedup-simulate /Backup1/project-docs-bak/project-docs
> Dedup-simulate /Backup1/project-docs-bak/project-docs: objspace
> 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 7
> Dedup-simulate /Backup1/project-docs-bak/project-docs succeeded
> Simulated dedup ratio = 1.15
> 
> ===================================================================================================
> 
> 3) Real 'de-dup' of the Mother File System and all PFSes
> 
> =======================================================================
> 
> dfly-bkpsrv# hammer dedup /Backup1
> Dedup /Backup1: objspace 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 0
> Dedup /Backup1 succeeded
> Dedup ratio = 1.07
>      625 MB referenced
>      585 MB allocated
>      224 KB skipped
>           0 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/Data
> Dedup /Backup1/Data: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 1
> Dedup /Backup1/Data succeeded
> Dedup ratio = 1.34
>      259 GB referenced
>      193 GB allocated
>       40 MB skipped
>        1944 CRC collisions
>           0 SHA collisions
>          20 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/pkgsrc
> Dedup /Backup1/pkgsrc: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 9
> Dedup /Backup1/pkgsrc succeeded
> Dedup ratio = 1.10
>     1687 MB referenced
>     1539 MB allocated
>     1718 KB skipped
>           3 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/VersionControl
> Dedup /Backup1/VersionControl: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 2
> Dedup /Backup1/VersionControl succeeded
> Dedup ratio = 2.75
>      160 MB referenced
>       58 MB allocated
>      853 KB skipped
>           0 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/test
> Dedup /Backup1/test: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 3
> Dedup /Backup1/test succeeded
> Dedup ratio = 0.00
>         0 B referenced
>         0 B allocated
>         0 B skipped
>           0 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/www-5mbak/www-hot
> Dedup /Backup1/www-5mbak/www-hot: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 5
> Dedup /Backup1/www-5mbak/www-hot succeeded
> Dedup ratio = 1.39
>       50 GB referenced
>       36 GB allocated
>       53 MB skipped
>         167 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> Dedup /Backup1/mysql-1hbak/mysql-hot: objspace 8000000000000000:0000
> 7fffffffffffffff:ffff pfs_id 6
> Dedup /Backup1/mysql-1hbak/mysql-hot succeeded
> Dedup ratio = 13.78
>       117 GB referenced
>     8747 MB allocated
>         0 B skipped
>           0 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> 
> dfly-bkpsrv# hammer dedup /Backup1/project-docs-bak/project-docs
> Dedup /Backup1/project-docs-bak/project-docs: objspace
> 8000000000000000:0000 7fffffffffffffff:ffff pfs_id 7
> Dedup /Backup1/project-docs-bak/project-docs succeeded
> Dedup ratio = 1.15
>      247 MB referenced
>      215 MB allocated
>      102 KB skipped
>           0 CRC collisions
>           0 SHA collisions
>           0 bigblock underflows
> =================================================================================================
> 
> Now after de-duping all PFSes on First Disk a 'df -h' gives this details
> 
> Filesystem                Size   Used  Avail Capacity  Mounted on
> Backup1                   454G   313G   141G    69%    /Backup1
> 
> Before de-duping it was
> 
> Filesystem                Size   Used  Avail Capacity  Mounted on
> Backup1                   454G   451G   2.8G    99%    /Backup1
> 
> So that is reclaiming 30% of Disk space amounting to 138 GB :-)
> 
> Carefull configuring designing PFSes and snapshots can save a lot of Disk space.
> But de-dup can still save more :-)
> 
> 
> In order to 'de-dup' the file system automatically every day using
> 'hammer cleanup' in the periodic script I have put some thing like
> this in the configuration files for PFSes.
> 
> =============================================
> dfly-bkpsrv# hammer config /Backup1/VersionControl/
> snapshots 1d 1000d
> prune     1d 15m
> rebalance 1d 5m
> reblock   1d 60m
> recopy    30d 60m
> dedup     1d 30m
> ==============================================
> 
> A million thanks to Matt and team for DragonFly, Hammer, de-dup,
> vkernel and a lot of other gooddies comming up :-D
> 
> Thanks and Regards
> 
> --Siju
> 






More information about the Users mailing list