Description of the Journaling topology
Matthew Dillon
dillon at apollo.backplane.com
Wed Dec 29 16:35:45 PST 2004
:All this work on the VFS layer looks very exciting and I agree with most
:of what you've said, specially the "Solaris did it that way" comment. To
:get to the point, I have a couple of questions about this
:implementation.
:
:Where will the log reside? As a special file in /, at the end of the
:partition, in another section of the disk? I take a transparent
:migration from normal UFS to journaled UFS will be provided, at least I
:hope so :)
The log is just a file descriptor, which means that it could represent
a special journaling device, a pipe to a process, a regular file, and
in particular it could represent a socket piping the journaled data
to an off-site machine.
The plan is to evolve this basic mechanism into a more sophisticated
one as time passes, introducing a stream in the reverse direction to
allow the journaling target to tell the journaling system when a
piece of data has been physically committed to hard storage. This
information could in turn be fed back to a journal-aware filesystem
but I would stress that awareness of the journal by the filesystem
is not a requirement. One can reap huge benefits from the journaling
mechanism whether the filesystem is aware of it or not.
:Are you going to do this as a "black box" or provide an API for
:tuning/configuring? I had this idea of writing a journaling FS for
:FreeBSD some time ago, even wrote some code and one of my ideas was to
The descriptor is roughly equivalent to the "black box". The
kernel's journaling layer will not know or care where the journaled
data is actually going.
:provide a mechanism of hinting the FS with what you wanted to with the
:file (IIRC VxFS has some of this functions). The idea was that you could
:open any file and issue a set of ioctl calls saying e.g. I want RAW
:access to this file, no buffer cache involved (this one I called direct
:i/o), or I want that this file deleted in a secure way, or I want
:that a kernel event is generated whenever an unauthorized user/program
:attempts to access the file. Stuff like that.
This is a more traditional approach to journaling... that is, fully
integrating it into the filesystem, but it also greatly reduces the
flexibility of a journaling system and given a properly managed VFS
API into the filesystem it should not be necessary to actually integrate
the journaling code into the filesystem at a thousand different points.
It would suffice to simply supply feedback to a journal-aware filesystem
with regards to when the journaled data has been committed to hard
storage.
:A problem I found was that concurrent access to the logging system
:exposes new problems you might not have taken into account when first
:designing such subsystem, but I'm sure you've thought about that
:already. What's your solution for a mutex-less concurrent access to the
:log? Is it possible to do without mutexes at all?
Yes and no. Concurrent access to the journal 'stream' is easy, you
just encapsulate the data into short-lived logical 'streams' (you'll
see the protocol spec commit on that tomorrow probably). Each
stream represents a transaction which, typically, means a VFS operation
(write, create, rmdir, mkdir, link, truncate, etc).
here is a quick example:
multiple journaling sources, one target
[------------------- IN MEMORY FIFO ----------------------]
[strmid,bytes][stream data]...[strmid,bytes][stream data]...
[id1][data] [id2][data] [id1][data] [id3][data] [id2][data] ...
^ ^
process 1 process 1
blocks unblocks
This allows both blocking and non-blocking transactions to be output
to the journal as the data becomes available without tripping over
or stalling each other out.
Mutexless operation is a different issue. In the current scheme I am
implementing there is a single memory FIFO going between the N processes
issuing VFS ops and the (1) worker thread that is responsible for
writing the journal out.
However, with the logical stream abstraction there is no reason why we
couldn't have a per-cpu FIFO (one FIFO per cpu per journal). Since
DragonFly processes do not migrate between cpus while operating in the
kernel, even if they block, the serialization abstraction that the
logical stream mechanism provides would still work, and a per-cpu
memory FIFO would not require any mutexes to operate. The worker thread
would collect the data from all available FIFOs rather then from a single
FIFO. I am not going to do this initially because it is unnecessarily
complex at this early stage, but the API will easily support such an
implementation.
:Other stuff I wanted to implement, but isn't really related to logging,
:was alternate streams (like NTFS has) and rich metadata, like BeFS had.
:What do you think about that?
Sure, and you have the choice of either integrating the mechanism
into a filesystem directly or providing an 'emulation' layer in the
kernel that provides the same API but is able to run on top of
filesystems that are not alternative-streams aware.
:I remember someone (maybe you) talking about per cylinder group dirty
:flags on UFS some time ago, to reduce fsck time. This could also be a
:nice addition, although this is definitely FS-dependent code.
I believe this is doable for UFS and could greatly reduce fsck
times. It isn't on my hot list (too many other interesting things
in my queue that I want to do).
:Like I've said, this looks very interesting and exciting area to work
:in. :)
Yup!
:Cheers,
:--
:Miguel Mendez <flynn at xxxxxxxxxxxxxxxxxx> | lea gfx_lib(pc),a1
More information about the Kernel
mailing list