Regarding Filesystems

Sun Jul 27 17:53:41 PDT 2008

:Hello Sir,
:
:Excuse me if my questions was some how stupid.
:
:As I was following, DragonFlyBSD got a the new HAMMER FS. some
:questions pushed up my head stack.
:If I ain't wasting your time, can u please answer my cents.
:
:1- I know that UFS divides the disk into cylinder group, does this has
:anything to do with disk's physical cylinders, let's say they are the
:same thing ?

    In the good old days knowing the physical characteristics of the disk
    greatly improved performance.  For example, if the seek latency for
    getting from one track to another was known then the write layout could
    be adjusted to skip the appropriate number of sectors on the track
    boundary so as to be able to stream file data at the full platter rate
    and not have to wait for a full rotation due to missing the next sector.

    UFS, or more specifically the FFS improvements made to UFS circa 1984,
    took advantage of this knowledge to greatly improve the performance
    of the filesystem.

    Modern day drives pack in 4-16MB of cache ram on the drive, even more,
    will do read-ahead, and will buffer writes.  This smooths out the
    issues and it isn't so much of an advantage any more for a filesystem
    to know the physical disk layout.  Also, all modern drives use 
    variable-sector geometries, so the number of sectors per track tends
    to decrease as the head moves closer to the center of the disk.  Plus
    all modern drives will move sectors around to get around media errors.
    It gets kinda silly for the filesystem to try to track all of that.

:2- In Journaling FS, does the jornal logs all the meda-data itself, or
:just the operations to be performed on them,
:like DELETE INODE 800 ?

    It depends.  I think it has generally been recognized that
    meta-data-only journaling is superior to full journaling.  There
    are many kinds of journals.  Some do forward logging of events and
    can 'REDO' the operations when recovering from a crash.  Some will
    log UNDO information so as to be able to perform a rollback when
    recovering from a crash.  Some do both.  Varying amounts of information
    are logged.  Sometimes high level operations are logged, aka
    'DELETE INODE 800', other times physical byte-range modifications
    are logged.  Sometimes both.  Sometimes sufficient information is logged
    to be able to UNDO *and* REDO (run the journal in either direction)...
    for example, DragonFly's generic high-level journaling can do that.

:3- What is actually the different between a snapshot and revision (i
:know that some file system supports different versions of the same
:file) ?

    I'm not sure there is a difference per-say, but snapshots tend to
    represent whole filesystems while revisions usually represent changes
    made to individual files.

:4- What do the B-tree actually hold in HAMMER, do they hold the inodes?

    Everything.  Inodes, directory entries, file data blocks, symlinks...
    everything.

:5- Why not use VFS-level journaling like FreeBSD gjournal, which works
:on any FS ?

    High level journaling can be used for mirroring/replication,
    multi-master replication, and audit-trail, but typically cannot be
    used to recover a filesystem after a crash.

    Block level journaling can be used for low level mirroring but typically
    requires fairly significant interactions with the filesystem layer if
    the intention is to also use it to recover a filesystem after a crash.

    Journals are queued entities, typically generating serialized stream
    which must be stored somewhere.  This means that journals which are
    not directly integrated into the filesystem implementation tend to
    have extremely severe restrictions on how they can be used.  For example,
    you cannot use something like gjournal for mirroring unless the mirror
    is online 100% of the time, and no interruption in the stream can be
    tolerated.  Such journals can also create large bottlenecks in
    performance.

    I have to say that I don't think there is much of a point to using a
    geom-layer journal like gjournal, just on general principles.  You
    have to be willing to live within its very limited capabilities and
    there really isn't any flexibility there for future improvement.

    Integrated journals are another matter.  Filesystems which are designed
    around journaled operation do not have the same limitations.

:6- I read before that block devices are gone from FreeBSD and
:DragonFlyBSD, then how the disks are managed which are block devices,
:its bit confusing ?
:
:Sorry if I wasted your time for my questions.
:
:Thanks a lot

    Block devices still exist.  What was removed were 'buffered' block
    devices.  The system no longer gives user direct access to the buffer
    cache and any block device access made by userland will be unbuffered --
    direct I/O essentially.  Filesystems still have access to the buffer
    cache layer and use it to buffer their own interactions with the
    underlying block device.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>