Regarding Filesystems
Matthew Dillon
dillon at apollo.backplane.com
Sun Jul 27 17:53:41 PDT 2008
:Hello Sir,
:
:Excuse me if my questions was some how stupid.
:
:As I was following, DragonFlyBSD got a the new HAMMER FS. some
:questions pushed up my head stack.
:If I ain't wasting your time, can u please answer my cents.
:
:1- I know that UFS divides the disk into cylinder group, does this has
:anything to do with disk's physical cylinders, let's say they are the
:same thing ?
In the good old days knowing the physical characteristics of the disk
greatly improved performance. For example, if the seek latency for
getting from one track to another was known then the write layout could
be adjusted to skip the appropriate number of sectors on the track
boundary so as to be able to stream file data at the full platter rate
and not have to wait for a full rotation due to missing the next sector.
UFS, or more specifically the FFS improvements made to UFS circa 1984,
took advantage of this knowledge to greatly improve the performance
of the filesystem.
Modern day drives pack in 4-16MB of cache ram on the drive, even more,
will do read-ahead, and will buffer writes. This smooths out the
issues and it isn't so much of an advantage any more for a filesystem
to know the physical disk layout. Also, all modern drives use
variable-sector geometries, so the number of sectors per track tends
to decrease as the head moves closer to the center of the disk. Plus
all modern drives will move sectors around to get around media errors.
It gets kinda silly for the filesystem to try to track all of that.
:2- In Journaling FS, does the jornal logs all the meda-data itself, or
:just the operations to be performed on them,
:like DELETE INODE 800 ?
It depends. I think it has generally been recognized that
meta-data-only journaling is superior to full journaling. There
are many kinds of journals. Some do forward logging of events and
can 'REDO' the operations when recovering from a crash. Some will
log UNDO information so as to be able to perform a rollback when
recovering from a crash. Some do both. Varying amounts of information
are logged. Sometimes high level operations are logged, aka
'DELETE INODE 800', other times physical byte-range modifications
are logged. Sometimes both. Sometimes sufficient information is logged
to be able to UNDO *and* REDO (run the journal in either direction)...
for example, DragonFly's generic high-level journaling can do that.
:3- What is actually the different between a snapshot and revision (i
:know that some file system supports different versions of the same
:file) ?
I'm not sure there is a difference per-say, but snapshots tend to
represent whole filesystems while revisions usually represent changes
made to individual files.
:4- What do the B-tree actually hold in HAMMER, do they hold the inodes?
Everything. Inodes, directory entries, file data blocks, symlinks...
everything.
:5- Why not use VFS-level journaling like FreeBSD gjournal, which works
:on any FS ?
High level journaling can be used for mirroring/replication,
multi-master replication, and audit-trail, but typically cannot be
used to recover a filesystem after a crash.
Block level journaling can be used for low level mirroring but typically
requires fairly significant interactions with the filesystem layer if
the intention is to also use it to recover a filesystem after a crash.
Journals are queued entities, typically generating serialized stream
which must be stored somewhere. This means that journals which are
not directly integrated into the filesystem implementation tend to
have extremely severe restrictions on how they can be used. For example,
you cannot use something like gjournal for mirroring unless the mirror
is online 100% of the time, and no interruption in the stream can be
tolerated. Such journals can also create large bottlenecks in
performance.
I have to say that I don't think there is much of a point to using a
geom-layer journal like gjournal, just on general principles. You
have to be willing to live within its very limited capabilities and
there really isn't any flexibility there for future improvement.
Integrated journals are another matter. Filesystems which are designed
around journaled operation do not have the same limitations.
:6- I read before that block devices are gone from FreeBSD and
:DragonFlyBSD, then how the disks are managed which are block devices,
:its bit confusing ?
:
:Sorry if I wasted your time for my questions.
:
:Thanks a lot
Block devices still exist. What was removed were 'buffered' block
devices. The system no longer gives user direct access to the buffer
cache and any block device access made by userland will be unbuffered --
direct I/O essentially. Filesystems still have access to the buffer
cache layer and use it to buffer their own interactions with the
underlying block device.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list