Description of the Journaling topology

Matthew Dillon dillon at apollo.backplane.com
Wed Dec 29 16:35:45 PST 2004


:All this work on the VFS layer looks very exciting and I agree with most
:of what you've said, specially the "Solaris did it that way" comment. To
:get to the point, I have a couple of questions about this
:implementation.
:
:Where will the log reside? As a special file in /, at the end of the
:partition, in another section of the disk? I take a transparent
:migration from normal UFS to journaled UFS will be provided, at least I
:hope so :)

    The log is just a file descriptor, which means that it could represent
    a special journaling device, a pipe to a process, a regular file, and 
    in particular it could represent a socket piping the journaled data
    to an off-site machine.

    The plan is to evolve this basic mechanism into a more sophisticated
    one as time passes, introducing a stream in the reverse direction to
    allow the journaling target to tell the journaling system when a
    piece of data has been physically committed to hard storage.  This
    information could in turn be fed back to a journal-aware filesystem
    but I would stress that awareness of the journal by the filesystem
    is not a requirement.  One can reap huge benefits from the journaling
    mechanism whether the filesystem is aware of it or not.

:Are you going to do this as a "black box" or provide an API for
:tuning/configuring? I had this idea of writing a journaling FS for
:FreeBSD some time ago, even wrote some code and one of my ideas was to

    The descriptor is roughly equivalent to the "black box".  The 
    kernel's journaling layer will not know or care where the journaled
    data is actually going.

:provide a mechanism of hinting the FS with what you wanted to with the
:file (IIRC VxFS has some of this functions). The idea was that you could
:open any file and issue a set of ioctl calls saying e.g. I want RAW
:access to this file, no buffer cache involved (this one I called direct
:i/o), or I want that this file deleted in a secure way, or I want
:that a kernel event is generated whenever an unauthorized user/program
:attempts to access the file. Stuff like that.

    This is a more traditional approach to journaling... that is, fully
    integrating it into the filesystem, but it also greatly reduces the
    flexibility of a journaling system and given a properly managed VFS
    API into the filesystem it should not be necessary to actually integrate
    the journaling code into the filesystem at a thousand different points.
    It would suffice to simply supply feedback to a journal-aware filesystem
    with regards to when the journaled data has been committed to hard
    storage.

:A problem I found was that concurrent access to the logging system
:exposes new problems you might not have taken into account when first
:designing such subsystem, but I'm sure you've thought about that
:already. What's your solution for a mutex-less concurrent access to the
:log? Is it possible to do without mutexes at all?

    Yes and no.  Concurrent access to the journal 'stream' is easy, you
    just encapsulate the data into short-lived logical 'streams' (you'll
    see the protocol spec commit on that tomorrow probably).  Each
    stream represents a transaction which, typically, means a VFS operation
    (write, create, rmdir, mkdir, link, truncate, etc).

    here is a quick example:

	multiple journaling sources, one target
	
	[------------------- IN MEMORY FIFO ----------------------]
	[strmid,bytes][stream data]...[strmid,bytes][stream data]...

	[id1][data] [id2][data] [id1][data] [id3][data] [id2][data] ...
		  ^		^
		  process 1	process 1
		  blocks	unblocks

    This allows both blocking and non-blocking transactions to be output
    to the journal as the data becomes available without tripping over 
    or stalling each other out.

    Mutexless operation is a different issue.  In the current scheme I am
    implementing there is a single memory FIFO going between the N processes
    issuing VFS ops and the (1) worker thread that is responsible for
    writing the journal out. 

    However, with the logical stream abstraction there is no reason why we
    couldn't have a per-cpu FIFO (one FIFO per cpu per journal).  Since
    DragonFly processes do not migrate between cpus while operating in the
    kernel, even if they block, the serialization abstraction that the
    logical stream mechanism provides would still work, and a per-cpu
    memory FIFO would not require any mutexes to operate.   The worker thread
    would collect the data from all available FIFOs rather then from a single
    FIFO.  I am not going to do this initially because it is unnecessarily
    complex at this early stage, but the API will easily support such an
    implementation.

:Other stuff I wanted to implement, but isn't really related to logging,
:was alternate streams (like NTFS has) and rich metadata, like BeFS had.
:What do you think about that?

    Sure, and you have the choice of either integrating the mechanism
    into a filesystem directly or providing an 'emulation' layer in the
    kernel that provides the same API but is able to run on top of
    filesystems that are not alternative-streams aware.

:I remember someone (maybe you) talking about per cylinder group dirty
:flags on UFS some time ago, to reduce fsck time. This could also be a
:nice addition, although this is definitely FS-dependent code.
    
    I believe this is doable for UFS and could greatly reduce fsck
    times.  It isn't on my hot list (too many other interesting things
    in my queue that I want to do).

:Like I've said, this looks very interesting and exciting area to work
:in. :)

    Yup!

:Cheers,
:-- 
:Miguel Mendez <flynn at xxxxxxxxxxxxxxxxxx>     | lea     gfx_lib(pc),a1





More information about the Kernel mailing list