Background fsck

Matthew Dillon dillon at apollo.backplane.com
Mon Jan 19 16:34:36 PST 2004


:Joerg Sonnenberger wrote:
:> 4 MB pieces looked in the different areas of the disk. The option to
:> split the journal into pieces becomes useful if you have RAID or similiar
:> means in place, which can effect the overall performance dramatically.
:
:I think it could benefit performance putting a journal on a faster RAID 1 
:mirror if you have a huge RAID5 array. Remember that if you have one of
:those expensive RAID controllers with battery-backed write-back cache,
:you should have the FS synced so that you have the speed-up
:of the write-back cache AND data consistency guarantee in a crash. softupdates
:alone guarantee metadata consistency in a crash only. You still lose
:data.

    You would still lose data with a meta-data-only journal, just not as
    much, but meta-data logs have a lot of flexibility, it would be possible
    to make certain guarentees and it is also possible to log file data
    as well as meta-data when the circumstances dictate.

	create file
	write 8k
	write 8k
	write 8k
	...
	close

    In the journal this would be (the T numbers on the left indicate atomic
    transactions):

	T1 inode meta data reflecting a directory size change
	T1 directory data reflecting the new directory entry
	T1 bitmap meta data reflecting the inode allocation
	T1 inode meta data reflecting the create
	T2 block allocation meta data
	T2 inode meta data reflecting a file size change to 8K
	T3 block allocation meta data
	T3 inode meta data reflecting a file size change to 16K
	T4 block allocation meta data
	T4 inode meta data reflecting a file size change to 24K

    Directory data can be written at any point after T1 commits.  That is,
    directory data has to be treated the same as meta-data.  File
    block data can be written at any time before or after the log is
    committed as long as the blocks represent new data blocks and not
    a reuse of old data blocks recently deleted (that can be solved by
    not actually marking blocks in the bitmap as 'free' until after 
    the associated meta data is committed).

    What this does not do is tell us that the data blocks are good.  If
    we were to crash the file could be the correct size, but contain
    'garbage'.

    One solution to the file-data-garbage problem would be to log file
    data block write completions as meta-data:

	...
	T80 block write complete
	T81 block write complete
	T82 block write complete

    Then the crash recovery program could notice that a file block was
    allocated but never written and either write out 0's to that block,
    or truncate the file.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Kernel mailing list