BSD Magazine article on DragonFlyBSD hammer filesystem volume management

Fri Oct 6 21:34:41 PDT 2017

Siju George wrote:

> Hi,
> 
> I have written an article about DragonFlyBSD hammer filesystem volume
> management. This will be particularly useful for Linux users familiar
> with lvm.
> 
> https://bsdmag.org/download/military-grade-data-wiping-freebsd-bcwipe
> 
> Thanks
> 
> --Siju

I just read the article twice and I am completely confused (probably to
be expected by the guy who is train in theoretical mathematics and
astronomy). I would like if somebody could correct my understanding of
the things. 

On the most basic level OSs manage disks through device nodes. For many
years engineers were created filesystems with by assuming that the
filesystem will go directly to the physical disk. Proliferation of
commodity hardware have caused to re-evaluate this idea. Namely not all
drives have the same rotational speed, number of platters etc. So much
like French approach in mathematics somebody come with the idea of
abstraction of the physical devices and stop lying to OSs about the
geometry of hard drives. That is way all modern HDDs use Logical Block
Addressing (LBA). Abstraction didn't end up here.

On the next abstraction layer we have disk slicing (Linux guys oblivious
to BSD partitions call these slices partitions). I am semi-familiar with
to slicing schemes. One is old MBR scheme still default on OpenBSD and
another is GPT scheme. On DragonFly one shell always use GPT scheme. man
gpt. For example

dfly# gpt show /dev/da3
      start       size  index  contents
          0          1      -  PMBR
          1          1      -  Pri GPT header
          2         32      -  Pri GPT table
         34          1      0  GPT part - DragonFly Label64
         35  976773100      1  GPT part - DragonFly Label64
  976773135         32      -  Sec GPT table
  976773167          1      -  Sec GPT header

On Linux one uses parted to do GPT scheme while FreeBSD uses gpart.
Once you slice your disk (typically Linux will do three slices while
historically DragonFly is using only one slice (dangerously dedicated)
or two slices like in the example above) we can start talking about BSD
partitions. On the above hard drive dat is located in the slice indexed
by 1. Slice with index 0 is created for historical reasons so that the
drive is not dangerously dedicated. 

A slice can be further divided into the BSD partitions (up to 16) on
DragonFly (Linux has no equivalent) using command disklabel64. Lets look
how the slices on the above disk are divided 

dfly# disklabel64 -r /dev/da3s0
disklabel64: /dev/da3s0: No such file or directory
dfly# disklabel64 -r /dev/da3s1
# /dev/da3s1:
#
# Informational fields calculated from the above
# All byte equivalent offsets must be aligned
#
# boot space:    1059328 bytes
# data space:  488385505 blocks # 476938.97 MB (500106757632 bytes)
#
# NOTE: If the partition data base looks odd it may be
#       physically aligned instead of slice-aligned
#
diskid: a6a0a2ef-a4d1-11e7-98d9-b9aeed3cce35
label: 
boot2 data base:      0x000000001000
partitions data base: 0x000000103a00
partitions data stop: 0x007470bfc000
backup label:         0x007470bfc000
total size:           0x007470bfd800    # 476939.99 MB
alignment: 4096
display block size: 1024        # for partition display only

16 partitions:
#          size     offset    fstype   fsuuid
  a:  104857600          0    4.2BSD    #  102400.000MB
  b:  262144000  104857600    4.2BSD    #  256000.000MB
  a-stor_uuid: 5ad35da2-a4d2-11e7-98d9-b9aeed3cce35
  b-stor_uuid: 5ad35dba-a4d2-11e7-98d9-b9aeed3cce35

As you can see the first slice contains no partitions. The second slice
is partitioned in two parts 

da3s1a and da3s1b 

One could put the file system UFS or HAMMER on those partitions. It is
totally possible to have different file systems on partitions. For
example da3s1a is formated to use UFS while da3s1b is formated to use
Hammer. DragonFly cousin FreeBSD can do the same. Linux can't. 

On Linux the slice sda0a can't be further subdivided and must contain a
file system like XFS. Not so quickly. On Linux some people decided to
add another abstract layer called logical volume manager (LVM). One can
combine several physical volumes (not recommended as the software RAID
should be used for that) into the LVM volume group. Volume group is
further divided into the logical volumes. File system is going onto the
top of the logical volumes. Cool thing about logical volumes is that
they can be expanded, shrank while mounted. Actually the coolest this
is that one can take a snapshot of the logical volume group although I
have not seeing many Linux users do that in the practice. In the terms
of file system the only real choice is XFS. Historically LVM didn't
provide any options for redundancy or parity thus software RAID (mdadm)
or hardware RAID was needed.  It is perfectly OK to initiate RAID
/dev/md0 as a volume group partition it into few logical volumes and
than put the XFS on the top of it.

Going back to cousin FreeBSD now. Unlike Linux FreeBSD uses GEOM instead
of LVM. My understanding is that GEOM combines LVM, RAID, and even
encryption (I am not sure that GELI is the part of GEOM) into the one.
GEOM used as LVM allows for UFS journaling. 

So where are HAMMER1 and ZFS in all this story? 

On one hand ZFS makes hardware/software RAID obsolete as it is a volume
manager (in the sense of RAID). It is also volume manager in the sense
of LVM with caveats (ZFS volumes on FreeBSD IIRC could be grown only by
off-lining physical drives and replacing them with larger drives before
re-silvering).  However ZFS brings a lot of new goodies COW, checksum,
self-healing, compression, snapshots (the one that people actually use
unlike LVM), remote replication. It is possible to use ZFS Volume as a
iSCSI Target and to boot from the ZFS volume (all my file servers do
that and even use beadm) It is the best thing after the slice of bread
if you are willing to reimplement large part of Solaris kernel (which
FreeBSD people did). ZFS pools which could be thought of as LVM volume
groups can be divided into the datasets which is in some sense
equivalent to LVM logical volumes.

HAMMER1 is unable to manage volumes in the sense of RAID. It requires
software or hardware RAID for high availability and redundancy. To my
knowledge software RAID discipline 1 is achieved on DragonFly via old
FreeBSD 4.8 framework natacontrol. I am not sure if that thing should be
used in production any longer. Any how hardware RAID seems a way to go.
HAMMER1 can't be expended so unlike ZFS it is much more pure file
system. However as a file system it is second to none. COW, checksum
healing via history, fine-grained journaling, snapshots, etc. HAMMER1
equivalent of datasets are pseudo file systems pfs which are very cheap
(for example home directory of each user on the file server could be a
PFS which could be destroy in the case of policy violation). HAMMER1
comes with built in backup (mirror-stream). Unfortunately slave PFSs
are read only. DF can't boot from HAMMER1. Anyhow the PFS do indeed to
some extend look like Linux logical volumes but they are so much more
advanced. HAMMER1 is NFS and Samba aware (NAS) but I am not sure that
DragonFly SAN (iSCSI) capabilities. 

I would really appreciate if people can point mistakes in the above
write up and give me some references so that I can actually learn
something.  

I am finishing this post by saying that I am all in suspense expecting
the release of DF 5.0 and preview of HAMMER2.

Cheers,
Predrag