Description of the Journaling topology

Maxim Sobolev sobomax at
Thu Dec 30 18:19:18 PST 2004

Matthew Dillon wrote:
:I think that there is a basic synchronisation issue in such topology. 
:Due to buffering, delays, etc it is possible that in some cases 
:filesystem will commit changes to the permanent storage before 
:appropriate journaling entry is created, i.e.:
:1. App executes unlink("foo").
:2. Kernel sends appropriate VOP to the filesystem and to the journal.
:3. Filesystem commits metadata update, journal entry still sits 
:somewhere in the buffer.
:4. App executes open("foo", O_CREAT).
:5. Kernel sends appropriate VOP to the filesystem and to the journal.
:6. Journaling system commits unlink() entry to the storage.
:7. Filesystem commits metadata update, machine crashes before journal 
:entry for open() is committed.
:On reboot, kernel tries to replay journal as a result already created 
:file foo is lost. The same situation may happen for subsequent write's 
:and other operations -  due to jounrnal lagging behing storage it is 
:possible that in the case of failure some data already written to the 
:storage is lost.
:How you are going to address this issue?
    Solving this issue requires the filesystem to be aware of the journal's
    existance, which I've mentioned in past posts.  The filesystem would
    have to buffer related disk operations until it gets positive
    confirmation that the related journal entries have been committed.
    This is similar to what softupdates does, but the implementation 
    would not have to be anywhere near as sophisticated.

    Baring that you might not be able to guarentee that an incremental
    playback from the journal would be sufficient to fully recover the
    filesystem.  But even in that case A full restore from backups and full
    playback from the journal would be able to fully recover the
    filesystem up to N seconds prior to the crash.  It would just take longer.
    So the basic property of being able to restore within N seconds is
    still guarenteeable even without a journal-aware filesystem.
Yes, this is what I am talking about. So that you can forget about fast 
recovery of filesystem into a consistend state after a crash - one of 
the selling points of today's journaled fs.


More information about the Kernel mailing list