About FLUSH and SUJ.

Jeff Roberson jroberson at jroberson.net
Tue Jan 19 16:21:02 PST 2010


Hey Folks,

I read your thread about flush, softdep, and suj, and I wanted to take a 
moment to reply.  SUJ already has mechanisms to delay the acknowledgement 
of journal writes and also delay the free of journal space.  Using these 
mechanisms it wouldn't take longer than a day to add flush support.  The 
journal already tolerates the constituent metadata writes happening in any 
order.  All that is needed is a flush barrier before the dependencies are 
released and again before journal space is released.

I have not yet added flush support simply because it is not my priority. 
People with non-enterprise hardware are also the most likely to have disks 
that don't obey flush.  Furthermore, determining how well your drive 
honors flush is not a trivial task.  The existing fsck can be used in the 
event of a power failure, which is typically quite rare, to successfully 
recover the fs.  For those that want to tolerate some moderate slowdown at 
runtime it could be enabled with a mount option.

Anyhow, my point is, it has not been overlooked, and it is simple to add, 
but it is not yet on the top of my todo.  This is quite a large project 
with a lot of moving pieces and I'm still not yet finished.  I made sure 
the design did not preclude flush but I think it warrants more 
investigation.

I also wanted to comment on Matt's suggestion to put the segment header on 
every 512 byte block.  This is indeed a great idea and I was just in the 
process of revising the segment format to solve this problem.  Initially I 
had been working on a per-fs fragment basis but there are cases where you 
must flush a journal entry before there is a fragments worth of data 
leading to poor utilization.  So now it can write and recover a disk block 
at a time.

Thanks for the comments,
Jeff




More information about the Kernel mailing list