PFS in HAMMER1
Tomohiro Kusumi
kusumi.tomohiro at gmail.com
Wed Jun 15 17:26:52 PDT 2016
There was discussion about @@ in PFS path on irc channel a few days
ago. This post has nothing to do with it, but explains what that @@
really means. This is very tricky, so most users probably had
difficult time understanding what this @@ means.
Note that non-master (slave) PFS is a bit different, but this post
only explains master PFS.
---
First, create fs and mount the fs.
[root@]~# newfs_hammer -L TEST /dev/da1 > /dev/null
[root@]~# mount_hammer /dev/da1 /HAMMER
[root@]~# cd /HAMMER
Create a PFS within that fs. Note that the location does matter when
there are >1 HAMMER fs mounted like in this case. If you want to
create a PFS in this fs, it needs to be done somewhere under /HAMMER.
[root@]/HAMMER# hammer pfs-master test1 > /dev/null
[root@]/HAMMER# ls -l
total 0
lrwxr-xr-x 1 root wheel 10 Jun 15 21:15 test1 -> @@-1:00001
In this example, "test" is just a regular symlink created by
/sbin/hammer. Not having a symlink does no harm to HAMMER. You could
rm test1 if you want to as long as you know the existence of
"@@-1:00001" which is PFS#1. /sbin/hammer just makes it for you so you
can see "@@-1:00001" via ls via readlink(2).
[root@]/HAMMER# cd test1
[root@]/HAMMER/test1# touch aaa
[root@]/HAMMER/test1# cd ..
[root@]/HAMMER# rm test1
remove test1? y
[root@]/HAMMER# ls -l
total 0
[root@]/HAMMER# ls @@-1:00001
aaa
[root@]/HAMMER# ln -s @@-1:00001 test1
[root@]/HAMMER# ls test1
aaa
"test1" is actually a symlink to "@@PFS00001" instead of "@@-1:00001".
HAMMER interprets "@@PFS00001" to "@@-1:00001" via readlink(2), so ls
prints it in "@@-1:00001" format instead of "@@PFS00001". In the below
example, test2 is a link to "@@PFS00001", but ls shows "@@-1:00001"
just like it does for test1. "test1" created by /sbin/hammer and
"test2" created by ln point to the same PFS.
[root@]/HAMMER# ls -l
total 0
lrwxr-xr-x 1 root wheel 10 Jun 15 21:16 test1 -> @@-1:00001
[root@]/HAMMER# ln -s @@PFS00001 test2
[root@]/HAMMER# ls -l
total 0
lrwxr-xr-x 1 root wheel 10 Jun 15 21:16 test1 -> @@-1:00001
lrwxr-xr-x 1 root wheel 10 Jun 15 21:18 test2 -> @@-1:00001
"@@-1:00001" is the canonical name for this PFS. This "@@-1:00001"
doesn't physically exist as a directory entry, but this represents the
existence of PFS which is logically separated pseudo filesystem space
within HAMMER's B-Tree. Because "@@-1:00001" doesn't exist as a
directory entry, it never appears in ls result. This is why
/sbin/hammer creates a symlink so as to visualize the existence of
PFS.
HAMMER only has 1 large B-Tree per filesystem (not per PFS), so all
the PFS exist within that single B-Tree. PFS are separated by
localization parameter which is one of the B-Tree keys used to lookup
the tree.
Each substring in "@@-1:00001" means
1. "@@" means it's a PFS or snapshot.
2. "-1" means it's a master.
3. ":" is just a separator.
4. "00001" means it's PFS#1, where PFS#0 is the default PFS created on
newfs. There is no "00000" because that's what's mounted on /HAMMER.
PFS# is used for localization parameter.
Localization parameter has the highest priority when inserting or
looking up B-Tree elements, so fs elements that belong to the same
PFS# tend to be localized (clustered) within the B-Tree as shown
below.
Access to "@@-1:00001" means asking HAMMER to dive into PFS#1's root
inode where all elements of PFS#1 are clustered. Access to
"@@-1:00002" does the same to PFS#2.
HAMMER root-inode
/HAMMER
////\\
/ / \
/ ... \
/ .... \
PFS1 PFS2
////\\ ////\\
//////\\ //////\\
Snapshot formats "@@0x00..." works similarly to PFS except that it
filters the B-Tree by transaction id (0x00...) instead of diving into
a clustered subtree.
HAMMER root-inode
/HAMMER
////\\
//////\\
// /////\\
// //// //\\ <--filtered by TID
/// //// ///\\
///// /// ////\\
/// //// /////\\
Since "@@-1:00001" is not a directory entry of any directory, a weird
thing can happen as shown below. cd to a/b/c/d/e/f/g/test1 apparently
fails because there is no such directory entry, but cd to
a/b/c/d/e/f/g/@@-1:00001 doesn't fail. This means you could have
whatever parent directories you want in order to access PFS, because
HAMMER's namei doesn't care about parents of "@@-1:00001" given the
fact that no directory has such entry. I think this is a design issue
of HAMMER1.
[root@]/HAMMER# mkdir -p a/b/c/d/e/f/g
[root@]/HAMMER# cd a/b/c/d/e/f/g/test1
cd: no such file or directory: a/b/c/d/e/f/g/test1
[root@]/HAMMER# cd a/b/c/d/e/f/g/@@-1:00001
[root@]/HAMMER/a/b/c/d/e/f/g/@@-1:00001# pwd
/HAMMER/a/b/c/d/e/f/g/@@-1:00001
[root@]/HAMMER/a/b/c/d/e/f/g/@@-1:00001# ls
aaa
PFS created by DragonFly's installer use both symlink and nullmount.
Things under /pfs are symlinks to PFS created by /sbin/hammer. (Note
that PFS are no longer used by installer as mentioned in
http://lists.dragonflybsd.org/pipermail/users/2015-December/228472.html)
[root@]~# ls -l /pfs
total 0
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 home -> @@-1:00003
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 tmp -> @@-1:00002
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 usr.obj -> @@-1:00004
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 var -> @@-1:00001
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 var.crash -> @@-1:00005
lrwxr-xr-x 1 root wheel 10 Aug 26 2015 var.tmp -> @@-1:00006
These /pfs/xxx symlinks are nullmounted on /xxx,
[root@]~# cat /etc/fstab | grep "/pfs"
/pfs/var /var null rw 0 0
/pfs/tmp /tmp null rw 0 0
/pfs/home /home null rw 0 0
/pfs/usr.obj /usr/obj null rw 0 0
/pfs/var.crash /var/crash null rw 0 0
/pfs/var.tmp /var/tmp null rw 0 0
so it results in something like below.
[root@]~# mount | grep "/pfs"
/pfs/@@-1:00001 on /var (null, local)
/pfs/@@-1:00002 on /tmp (null, local)
/pfs/@@-1:00003 on /home (null, local)
/pfs/@@-1:00004 on /usr/obj (null, local)
/pfs/@@-1:00005 on /var/crash (null, local)
/pfs/@@-1:00006 on /var/tmp (null, local)
Though symlinks are made by /sbin/hammer, symlinks and nullmounts have
nothing to do with HAMMER fs itself. PFS are always accessible by
"@@-1:00001" format because this is the real name of PFS in terms of a
filesystem path, while others are just aliases or loopback mounts
which may or may not exist depending on the configuration.
More information about the Users
mailing list