About FLUSH and SUJ.
jroberson at jroberson.net
Tue Jan 19 16:21:02 PST 2010
I read your thread about flush, softdep, and suj, and I wanted to take a
moment to reply. SUJ already has mechanisms to delay the acknowledgement
of journal writes and also delay the free of journal space. Using these
mechanisms it wouldn't take longer than a day to add flush support. The
journal already tolerates the constituent metadata writes happening in any
order. All that is needed is a flush barrier before the dependencies are
released and again before journal space is released.
I have not yet added flush support simply because it is not my priority.
People with non-enterprise hardware are also the most likely to have disks
that don't obey flush. Furthermore, determining how well your drive
honors flush is not a trivial task. The existing fsck can be used in the
event of a power failure, which is typically quite rare, to successfully
recover the fs. For those that want to tolerate some moderate slowdown at
runtime it could be enabled with a mount option.
Anyhow, my point is, it has not been overlooked, and it is simple to add,
but it is not yet on the top of my todo. This is quite a large project
with a lot of moving pieces and I'm still not yet finished. I made sure
the design did not preclude flush but I think it warrants more
I also wanted to comment on Matt's suggestion to put the segment header on
every 512 byte block. This is indeed a great idea and I was just in the
process of revising the segment format to solve this problem. Initially I
had been working on a per-fs fragment basis but there are cases where you
must flush a journal entry before there is a fragments worth of data
leading to poor utilization. So now it can write and recover a disk block
at a time.
Thanks for the comments,
More information about the Kernel