BSD Magazine article on DragonFlyBSD hammer filesystem volume management

Tomohiro Kusumi kusumi.tomohiro at gmail.com
Sat Oct 7 09:08:05 PDT 2017


Hi
Regarding LVM part,

2017-10-07 7:34 GMT+03:00 Predrag Punosevac <punosevac72 at gmail.com>:
> Siju George wrote:
>
>
>> Hi,
>>
>> I have written an article about DragonFlyBSD hammer filesystem volume
>> management. This will be particularly useful for Linux users familiar
>> with lvm.
>>
>> https://bsdmag.org/download/military-grade-data-wiping-freebsd-bcwipe
>>
>> Thanks
>>
>> --Siju
>
> I just read the article twice and I am completely confused (probably to
> be expected by the guy who is train in theoretical mathematics and
> astronomy). I would like if somebody could correct my understanding of
> the things.
>
> On the most basic level OSs manage disks through device nodes. For many
> years engineers were created filesystems with by assuming that the
> filesystem will go directly to the physical disk. Proliferation of
> commodity hardware have caused to re-evaluate this idea. Namely not all
> drives have the same rotational speed, number of platters etc. So much
> like French approach in mathematics somebody come with the idea of
> abstraction of the physical devices and stop lying to OSs about the
> geometry of hard drives. That is way all modern HDDs use Logical Block
> Addressing (LBA). Abstraction didn't end up here.
>
> On the next abstraction layer we have disk slicing (Linux guys oblivious
> to BSD partitions call these slices partitions). I am semi-familiar with
> to slicing schemes. One is old MBR scheme still default on OpenBSD and
> another is GPT scheme. On DragonFly one shell always use GPT scheme. man
> gpt. For example
>
> dfly# gpt show /dev/da3
>       start       size  index  contents
>           0          1      -  PMBR
>           1          1      -  Pri GPT header
>           2         32      -  Pri GPT table
>          34          1      0  GPT part - DragonFly Label64
>          35  976773100      1  GPT part - DragonFly Label64
>   976773135         32      -  Sec GPT table
>   976773167          1      -  Sec GPT header
>
>
> On Linux one uses parted to do GPT scheme while FreeBSD uses gpart.
> Once you slice your disk (typically Linux will do three slices while
> historically DragonFly is using only one slice (dangerously dedicated)
> or two slices like in the example above) we can start talking about BSD
> partitions. On the above hard drive dat is located in the slice indexed
> by 1. Slice with index 0 is created for historical reasons so that the
> drive is not dangerously dedicated.
>
>
> A slice can be further divided into the BSD partitions (up to 16) on
> DragonFly (Linux has no equivalent) using command disklabel64. Lets look
> how the slices on the above disk are divided
>
> dfly# disklabel64 -r /dev/da3s0
> disklabel64: /dev/da3s0: No such file or directory
> dfly# disklabel64 -r /dev/da3s1
> # /dev/da3s1:
> #
> # Informational fields calculated from the above
> # All byte equivalent offsets must be aligned
> #
> # boot space:    1059328 bytes
> # data space:  488385505 blocks # 476938.97 MB (500106757632 bytes)
> #
> # NOTE: If the partition data base looks odd it may be
> #       physically aligned instead of slice-aligned
> #
> diskid: a6a0a2ef-a4d1-11e7-98d9-b9aeed3cce35
> label:
> boot2 data base:      0x000000001000
> partitions data base: 0x000000103a00
> partitions data stop: 0x007470bfc000
> backup label:         0x007470bfc000
> total size:           0x007470bfd800    # 476939.99 MB
> alignment: 4096
> display block size: 1024        # for partition display only
>
> 16 partitions:
> #          size     offset    fstype   fsuuid
>   a:  104857600          0    4.2BSD    #  102400.000MB
>   b:  262144000  104857600    4.2BSD    #  256000.000MB
>   a-stor_uuid: 5ad35da2-a4d2-11e7-98d9-b9aeed3cce35
>   b-stor_uuid: 5ad35dba-a4d2-11e7-98d9-b9aeed3cce35
>
>
> As you can see the first slice contains no partitions. The second slice
> is partitioned in two parts
>
> da3s1a and da3s1b
>
> One could put the file system UFS or HAMMER on those partitions. It is
> totally possible to have different file systems on partitions. For
> example da3s1a is formated to use UFS while da3s1b is formated to use
> Hammer. DragonFly cousin FreeBSD can do the same. Linux can't.
>
> On Linux the slice sda0a can't be further subdivided and must contain a
> file system like XFS. Not so quickly. On Linux some people decided to
> add another abstract layer called logical volume manager (LVM).

DragonFly has LVM too.
To be exact it's a port of Linux's LVM implementation called LVM2,
along with its kernel side subsystem Device Mapper which is an
abstraction layer for various block device level features, like LVM,
snapshot, software RAID, multipath, and much more.

The problem with DragonFly is that most of the (useful) DM target
drivers are either not working or not ported.
I'm not even sure if LVM2 is stable, let alone software RAID.


> One can
> combine several physical volumes (not recommended as the software RAID
> should be used for that) into the LVM volume group. Volume group is
> further divided into the logical volumes. File system is going onto the
> top of the logical volumes. Cool thing about logical volumes is that
> they can be expanded, shrank while mounted. Actually the coolest this
> is that one can take a snapshot of the logical volume group although I
> have not seeing many Linux users do that in the practice. In the terms
> of file system the only real choice is XFS. Historically LVM didn't
> provide any options for redundancy or parity thus software RAID (mdadm)
> or hardware RAID was needed.  It is perfectly OK to initiate RAID
> /dev/md0 as a volume group partition it into few logical volumes and
> than put the XFS on the top of it.

LVM is usually a different thing from RAID.
In case of LVM in Linux (which is DM based LVM2), you can stack DM
devices to synthesize these (or other block level features provided by
DM infrastructure).
You can stack DM devices on DragonFly too, but there aren't that many
good and/or stable DM target drivers worth stacking...


> Going back to cousin FreeBSD now. Unlike Linux FreeBSD uses GEOM instead
> of LVM. My understanding is that GEOM combines LVM, RAID, and even
> encryption (I am not sure that GELI is the part of GEOM) into the one.
> GEOM used as LVM allows for UFS journaling.

The reason DragonFly doesn't have GEOM is because that's the way they
wanted it to be at least in 2010.
https://www.dragonflybsd.org/mailarchive/users/2010-07/msg00085.html


> So where are HAMMER1 and ZFS in all this story?
>
> On one hand ZFS makes hardware/software RAID obsolete as it is a volume
> manager (in the sense of RAID). It is also volume manager in the sense
> of LVM with caveats (ZFS volumes on FreeBSD IIRC could be grown only by
> off-lining physical drives and replacing them with larger drives before
> re-silvering).  However ZFS brings a lot of new goodies COW, checksum,
> self-healing, compression, snapshots (the one that people actually use
> unlike LVM), remote replication. It is possible to use ZFS Volume as a
> iSCSI Target and to boot from the ZFS volume (all my file servers do
> that and even use beadm) It is the best thing after the slice of bread
> if you are willing to reimplement large part of Solaris kernel (which
> FreeBSD people did). ZFS pools which could be thought of as LVM volume
> groups can be divided into the datasets which is in some sense
> equivalent to LVM logical volumes.
>
>
> HAMMER1 is unable to manage volumes in the sense of RAID. It requires
> software or hardware RAID for high availability and redundancy.

This was actually the design of HAMMER1 as mentioned in section 11 of
below in 2008.
https://www.dragonflybsd.org/hammer/hammer.pdf


> To my
> knowledge software RAID discipline 1 is achieved on DragonFly via old
> FreeBSD 4.8 framework natacontrol. I am not sure if that thing should be
> used in production any longer. Any how hardware RAID seems a way to go.
> HAMMER1 can't be expended so unlike ZFS it is much more pure file
> system. However as a file system it is second to none. COW, checksum
> healing via history, fine-grained journaling, snapshots, etc. HAMMER1
> equivalent of datasets are pseudo file systems pfs which are very cheap
> (for example home directory of each user on the file server could be a
> PFS which could be destroy in the case of policy violation). HAMMER1
> comes with built in backup (mirror-stream). Unfortunately slave PFSs
> are read only. DF can't boot from HAMMER1. Anyhow the PFS do indeed to
> some extend look like Linux logical volumes but they are so much more
> advanced. HAMMER1 is NFS and Samba aware (NAS) but I am not sure that
> DragonFly SAN (iSCSI) capabilities.
>
>
> I would really appreciate if people can point mistakes in the above
> write up and give me some references so that I can actually learn
> something.
>
>
> I am finishing this post by saying that I am all in suspense expecting
> the release of DF 5.0 and preview of HAMMER2.
>
> Cheers,
> Predrag



More information about the Users mailing list