HEADS UP: dm, lvm, cryptsetup and initrd on master

Sun Jul 11 15:14:49 PDT 2010

Hi,

as you know I've been working on getting dm and lvm into DragonFly. I've
just committed my work so far which includes the following:
 - dm kernel part (including linear, stripe and crypt targets)
 - dm userland library
 - lvm userland tools
 - cryptsetup (with LUKS stuff)
 - some further modifications to the initrd stuff

First off, note that this is rather a HEADS DOWN notice; the stuff
should be considered experimental. It has proven safe and reliable in my
tests, but that does not mean that it's bug/problem-free.

The dm kernel part, ported from NetBSD, is compiled by default as a
module now and can be compiled into the kernel, as usual, in the kernel
config. The port from NetBSD includes a linear and stripe target; on top
of that I've also written a crypt target which aims to be (mostly)
compatible with cryptsetup. I've not actually tried reading a dm-crypt
(linux) volume with my crypto implementation, but volumes created from
scratch with cryptsetup on DragonFly work as expected. I'd definitely
welcome some more testing of the crypt target as it is definitely the
target that has had less testing, since it doesn't exist in NetBSD. Next
I'll start working on a mirror target so vinum isn't our only choice
anymore for software RAID1.

The dm/lvm userland tools (basically dmsetup and lvm) offer a set of
tools to manage general dm volumes and in case of lvm, specifically lvm
volumes. I've tried with several LVM volumes and everything seems to
work just fine. An rc script for lvm is also included so that lvm
volumes are automatically discovered on boot. You just need to set
lvm_enabled="YES" (or lvm="YES") in rc.conf, as usual. Note that the lvm
rc script also needs udevd, but I think that dependency would be enabled
automatically.

Now comes the interesting part: using crypt or lvm volumes as root
device. And this is where mkinitrd comes in. As the mkinitrd(8) man page
explains, mkinitrd creates an md image that can be used as an early
root, much like linux' initrd/initramfs. This initrd has a
tiny /sbin/init as well as a set of rc scripts that are run. To enable
the use of initrd, one has to create an initrd image with mkinitrd, add
the tunables as explained in mkinitrd(8) to loader.conf to preload the
initrd.img in /boot, and adjust vfs.root.mountfrom to "ufs:md0s0".

By default, the initrd image also runs a discovery of lvm volumes prior
to mounting the real root device. The initrd image reads the loader
tunable vfs.root.realroot (via a sysctl) to identify from which device
and how to mount the actual root device. Right now the initrd system
supports two types of 'realroot': 'crypt' and 'local', which are
specified with all their options in mkinitrd(8), too, but here are two
examples:
vfs.root.realroot="local:ufs:/dev/vg00/lv0"
vfs.root.realroot="crypt:ufs:/dev/ad0s0a:foocrypt"

The first one tells initrd to mount a local file system, of type ufs,
from /dev/vg00/lv0. This is alright, since the lvm discovery is run by
default prior to reading and interpreting the realroot setting.

The second one tells initrd to launch the rcmount_crypt script on the
image, which basically will set up the crypt volume that is supposed to
be on /dev/ad0s0a as volume 'foocrypt' (or
rather: /dev/mapper/foocrypt). cryptsetup will then (probably) ask for a
key and set up the volume so it is ready for mounting. Afterwards,
knowing that the system is a ufs file system, it will
mount /dev/mapper/foocrypt as a ufs file system as the now root device.

After this initialization sequence has run, the initrd init will take
back control, switch the root and pass control to the /sbin/init on the
real root device, and everything will (hopefully) work as expected.

If you care about the details of how this works, here it goes:
the /sbin/init on the initrd will chroot into /new_root (which is the
mount point of the new root, as the name gives away) and also issue a
system call to chroot_kernel, which will then set the 'rootnch' and
'rootvnode' globals in the kernel to the /new_root nch and vnode
entries. After this process, the initrd init will execv into the
actual /sbin/init.

I'd like to note here that I see also other possible use cases for the
initrd image; it can be used to configure any complicated root device
without having to write complicated kernel code to handle it. An ISCSI
target root should be easily achievable with this approach just to name
one further possibility.

The whole process is not without its drawbacks: since the md image is
preloaded by the loader, there is no way of freeing up the memory used
by it (15MB currently, but could be shrunk to 7 or 8 MB). Ideally we
would want an initrd image that can be completely freed after using it
to reclaim any otherwise wasted memory.

Cheers,
Alex Hornung