Description of the Journaling topology

Maxim Sobolev sobomax at FreeBSD.org
Thu Dec 30 16:10:38 PST 2004


Matthew Dillon wrote:
:All this work on the VFS layer looks very exciting and I agree with most
:of what you've said, specially the "Solaris did it that way" comment. To
:get to the point, I have a couple of questions about this
:implementation.
:
:Where will the log reside? As a special file in /, at the end of the
:partition, in another section of the disk? I take a transparent
:migration from normal UFS to journaled UFS will be provided, at least I
:hope so :)
    The log is just a file descriptor, which means that it could represent
    a special journaling device, a pipe to a process, a regular file, and 
    in particular it could represent a socket piping the journaled data
    to an off-site machine.

    The plan is to evolve this basic mechanism into a more sophisticated
    one as time passes, introducing a stream in the reverse direction to
    allow the journaling target to tell the journaling system when a
    piece of data has been physically committed to hard storage.  This
    information could in turn be fed back to a journal-aware filesystem
    but I would stress that awareness of the journal by the filesystem
    is not a requirement.  One can reap huge benefits from the journaling
    mechanism whether the filesystem is aware of it or not.
I think that there is a basic synchronisation issue in such topology. 
Due to buffering, delays, etc it is possible that in some cases 
filesystem will commit changes to the permanent storage before 
appropriate journaling entry is created, i.e.:

1. App executes unlink("foo").
2. Kernel sends appropriate VOP to the filesystem and to the journal.
3. Filesystem commits metadata update, journal entry still sits 
somewhere in the buffer.
4. App executes open("foo", O_CREAT).
5. Kernel sends appropriate VOP to the filesystem and to the journal.
6. Journaling system commits unlink() entry to the storage.
7. Filesystem commits metadata update, machine crashes before journal 
entry for open() is committed.

On reboot, kernel tries to replay journal as a result already created 
file foo is lost. The same situation may happen for subsequent write's 
and other operations -  due to jounrnal lagging behing storage it is 
possible that in the case of failure some data already written to the 
storage is lost.

How you are going to address this issue?

-Maxim





More information about the Kernel mailing list