Description of the Journaling topology
Maxim Sobolev
sobomax at FreeBSD.org
Thu Dec 30 16:10:38 PST 2004
Matthew Dillon wrote:
:All this work on the VFS layer looks very exciting and I agree with most
:of what you've said, specially the "Solaris did it that way" comment. To
:get to the point, I have a couple of questions about this
:implementation.
:
:Where will the log reside? As a special file in /, at the end of the
:partition, in another section of the disk? I take a transparent
:migration from normal UFS to journaled UFS will be provided, at least I
:hope so :)
The log is just a file descriptor, which means that it could represent
a special journaling device, a pipe to a process, a regular file, and
in particular it could represent a socket piping the journaled data
to an off-site machine.
The plan is to evolve this basic mechanism into a more sophisticated
one as time passes, introducing a stream in the reverse direction to
allow the journaling target to tell the journaling system when a
piece of data has been physically committed to hard storage. This
information could in turn be fed back to a journal-aware filesystem
but I would stress that awareness of the journal by the filesystem
is not a requirement. One can reap huge benefits from the journaling
mechanism whether the filesystem is aware of it or not.
I think that there is a basic synchronisation issue in such topology.
Due to buffering, delays, etc it is possible that in some cases
filesystem will commit changes to the permanent storage before
appropriate journaling entry is created, i.e.:
1. App executes unlink("foo").
2. Kernel sends appropriate VOP to the filesystem and to the journal.
3. Filesystem commits metadata update, journal entry still sits
somewhere in the buffer.
4. App executes open("foo", O_CREAT).
5. Kernel sends appropriate VOP to the filesystem and to the journal.
6. Journaling system commits unlink() entry to the storage.
7. Filesystem commits metadata update, machine crashes before journal
entry for open() is committed.
On reboot, kernel tries to replay journal as a result already created
file foo is lost. The same situation may happen for subsequent write's
and other operations - due to jounrnal lagging behing storage it is
possible that in the case of failure some data already written to the
storage is lost.
How you are going to address this issue?
-Maxim
More information about the Kernel
mailing list