HAMMER changes in HEAD, also needs testing

Matthew Dillon dillon at apollo.backplane.com
Sat Jun 27 18:34:38 PDT 2009


:I've done some performance tests on a HEAD system from a few days before
:the change, and just after.  A ~100GB tar file was extracted to a local
:HAMMER file system, and a cpdup copy of the files was made, result was
:traversed with find.  After updating system, to include HAMMER & cpdup
:changes of June 20th, same thing was done again.
:
:Unpacking and cpdup'ng was slower after updating, but find was faster.
:Also after updating traversing cpdup copies was faster for copies made befo=
:re update.
:
:This wasn't what I expected from the description of the changes, any comm=
:ents?

    Yah, the B-Tree organization is a bit better after the change, so
    find is a bit faster, but there are still some major latencies
    in getdirentries (if you ktrace the find you can see where the
    hangups are).  Basically getdirentries and the stat() of the
    first file in any given directory generates extra disk seeks
    and 18-40ms of latency.

    The only way to improve those latencies is to move the B-Tree elements
    related to directory entries into the same localization block as the
    inode elements.  I can do that for newly created directories but they
    will not be compatible with older versions of HAMMER, so I have to
    test that it actually improves matters before I make it available.

    The extraction issue is another data locality of reference problem.
    I actually did seem to make it a bit worse I think because the data
    blocks are getting a bit more spread out with the new B-Tree changes.
    However, it isn't so much worse as your tests came up with :-)

    Another thing you need to do is run a reblocking operation and even
    a rebalance and test the extraction again.  Find should get a bit
    better and extraction should remain about the same.

    I won't be able to tackle the extraction problem until I get the
    directory entry (find, ls -lR, etc) stuff dealt with.  The extra
    seeks done by the directory scans tend to blow away the hard drives
    internal cache so if I can reduce the seeks the extraction should get a
    lot faster.

:During tar file extraction after updating system, I noticed, looking at
:'hammer iostats 10', that inode-ops, file-rd & file-wr stalled for long
:periods of time, like 1-2 minutes; dev-read & dev-write was still high;
:after stall inode-ops would rocket up, but only for half a minute,
:then a new stall started. During stall, it seemed all file system
:operations was also stalled, including processes that I only can imagine
:was using NFS (but this can't be true I guess).

    What is happening is that you are seeing the inode flush operation.
    This is why the device ops goes up and the frontend ops goes down.
    The meta-data builds up in memory and has to be flushed to disk.
    You can monitor this by looking at vfs.hammer.count_records
    and vfs.hammer.count_iqueued.  When too much has built up the
    flush starts running in the background, but the rate at which files
    are being extracted is so high they run up against the limit while
    the flush is running and stall out.  It shouldn't take 1-2 minutes
    per flush, but it is pretty nasty if there are lots of tiny files
    being created.

    The meta-data flush is not very efficient.  Part of the problem is
    the way the inodes are sequenced but it's a tough nut to crack no
    matter how I twist it, particularly when extracting large directory
    trees.  Large directory trees have tons of directory entry dependencies
    and the flush operation has to flush the directory entry data in a
    particular order to ensure that crash recovery doesn't leave
    sub-directories disconnected.

    I think, generaly speaking, a tar extraction is never going to be
    very efficient with HAMMER.  I think tar archive creation and read
    scans can be made considerably more efficient.

:- BEFORE
:root at bohr# time tar xf /hammer/data/hammer.tar
:    15996.71 real       207.43 user      1768.43 sys
:root at bohr# time cpdup pre pre.cpdup
:    43937.88 real       206.70 user      4339.66 sys
:root at bohr# time find pre | gzip >pre.find.gz
:      847.99 real        19.55 user       185.58 sys
:root at bohr# time find pre -ls | gzip >pre.find.-ls.gz
:     1171.63 real        88.60 user       501.59 sys

:- HEAD system from June 20th, including HAMMER changes
:root at bohr# time tar xf /hammer/data/hammer.tar
:    21740.77 real       203.77 user      1752.30 sys
:root at bohr# time cpdup post post.cpdup
:    72476.51 real       204.83 user      4041.41 sys
:root at bohr# time find post | gzip >post.find.gz
:      488.17 real        13.37 user       126.37 sys
:root at bohr# time find post -ls | gzip >post.find.-ls.gz
:      854.08 real        65.07 user       397.89 sys

    I think there might be another issue with those tar xf
    and cpdup times.  That's too big a difference.  Part of it
    could be the fact that as the disk fills up you are probably
    writing to inner disk cylinders which have considerably less
    bandwidth then the outer cylinders, but even that doesn't
    account for a 30,000 second difference.

    Make sure that other stuff isn't going on, like an automatic
    cleanup.  You ran your test over 20 hours.  I'm guessing
    that the hammer cleanup cron job ran during the test.

    Also, the daily locate.db cron job probably ran as well.  You
    have to disable that too because it will probably take forever
    to run through a partition full of test files.

:root at bohr# time find pre.cpdup | gzip >pre.cpdup.post.find..gz
:      362.19 real        14.24 user       124.45 sys
:root at bohr# time find pre.cpdup -ls | gzip >pre.cpdup.post.find.-ls.gz
:      619.08 real        65.47 user       314.53 sys
:root at bohr# time find post.cpdup | gzip >post.cpdup.post.find..gz
:      450.81 real        14.18 user       134.78 sys
:root at bohr# time find post.cpdup -ls | gzip >post.cpdup.post.find.-ls.gz
:      813.99 real        68.53 user       408.46 sys

    These are more inline.  After a reblock/balance the finds
    should go a bit faster.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list