Description of the Journaling topology

Matthew Dillon dillon at apollo.backplane.com
Tue Dec 28 13:04:50 PST 2004


:Barely understanding the implication of this concept it strikes me 
:mostly logical, clean and relative simple.
:Which makes me curious why other project haven't done this already?
:What is the major reason that other project follow a different path then 
:this one?
:
:-- 
:mph

    The concepts aren't new but my recollection is that most journaling
    implementations are directly integrated into the filesystem and this
    tends to limit their flexibility.  Making the journaling a kernel 
    layer and taking into account forward-looking goals really opens up
    the possibilities.  Forward-looking is not something that people are
    generally good at in either the open-source or the commercial world.
    (proof of concept: why ext3 is such a mess, why existing journaling
    implementations are so limited in scope).

    Generally speaking open-source OS projects have been severely lacking 
    with regards to the construction of better backup paradigms, mostly 
    relying on hardware (e.g. RAID) and external technologies (e.g. NetApp),
    or relying on major assumptions with regards to disk data reliability
    (which are no longer true) (e.g. Ext3Fs, Reiser), or block-level
    snapshots (softupdates) which are cludgy.  External utilities like
    dump and tar have no realtime capabilities whatsoever and aren't even
    reliable when used as designed if the filesystem is being modified
    while a dump/tar is in progress.

    None of these integrated technologies really give me any peace of mind.  
    My number one desire is to have a technology that can give the sysop 
    actual peace of mind that his systems aren't going to crash and burn
    beyond any chance of recovery, be it through a software bug, disk crash,
    building fire, or intentional destruction (hackers).

    Our journaling layer is designed to address these issues.  Providing a
    high level filesystem operations change stream off-site is far more
    robust then providing a block device level change stream.  Being able
    to go off-site in real-time to a secure (or more secure) machine can't
    be beat.  Being able to rewind the journal to any point in time, 
    infinitely fine-grained, gives security managers and sysops and even
    users an incredibly powerful tool for deconstructing security events
    (e.g. log file erasures), recovering lost data, and so on and so forth. 
    These are very desireable traits, yah?  

    --

    So why hasn't it been done or, at least, why isn't it unversal after all
    these years?

    It's a good question.  I think it comes down to how most programmers
    have been educated over the years.  Its funny, but whenever I build
    something new the first question I usually get is "what paper is your
    work based on?".  I get it every time, without fail.  And every time,
    without fail, I find myself trying to explain to the questioner that
    I generally do not bother to *READ* research papers...  that I build 
    systems from scratch based on one or two sentence's worth of concept.

    If I really want to throw someone for a loop I ask him whether he'd
    rather be the guy inventing the algorithm and writing the paper, or
    the guy implementing it from the paper.  It's a question that forces
    the questioner to actually think with his noggin.

    I think that is really the crux of the problem... programmers have been
    taught to build things from templates rather then build things from
    concepts... and THAT is primarily why software is still stuck in the 
    dark ages insofar as I am concerned.  True innovation requires having
    lightbulbs go off above your head all the time, and you don't get that
    from reading papers.  Another amusing anecdote... every time I complained
    about something in FreeBSD-5 or 6 the universal answer I got was that
    'oh, well, Solaris did it this way' or 'there was a paper about this'
    or a myrid of other 'someone else wrote it down so it must be good'
    excuses.  Not once did I ever get any other answer.  Pretty sad, I think,
    and also sadly not unique to FreeBSD.  It's a problem with mindset, and
    mindset is a problem with our educational system (the entire world's).

    I'm really happy that DragonFly has finally progressed to the point where
    we can begin to implement our loftier goals.  Up until now the work has
    been primarily ripping out and reimplementing the guts of the system with
    very little visibility poking through to the end-user.  Now we are
    are starting to push into things that have direct consequences to the
    end-user.  The journaling is one of the three major legs that will
    support the ultimate goal of single-system-image clustering.  The second
    leg is a cache coherency scheme, and the third will be resource sharing
    and migration.  All three will have to be very carefully and deliberately
    integrated together into a single whole to achieve the ultimate goal.

    This makes journaling a major turning point for the project... one,
    I hope, that attracts more people to DragonFly.

						-Matt






More information about the Kernel mailing list