[issue1556] many processes stuck in "hmrrcm", system unusable

Wed Oct 7 10:59:15 PDT 2009

    What we have here is a situation where corecode's xterm+shell startup
    is accessing somewhere north of 900 files for various reasons.  Big
    programs with many shared libraries are getting run.  If those
    files get knocked out of the cache the startup is going to be slow.
    This is what is happening.

    HAMMER v2 is better at doing directory lookups but most of the time
    seems to be spent on it searching the B-Tree for the first file data
    block... it doesn't take a large percentage of misses out of the
    900 files to balloon into a multi-second startup.  UFS happens to have
    a direct blockmap from the inode.  HAMMER caches an offset to the
    disk block containing the B-Tree entry most likely to contain the
    file data reference.  HAMMER depends a lot more on B-Tree meta-data
    caches not getting blown out of the system.

    Some 400,000 files get accessed when using rdist or cvs to update
    something like the NetBSD CVS repo (corecode's test).  I can prevent
    the vnodes used to read files from getting blown out by vnodes used
    to stat files, but vnodes are not thrown away unless the related VM
    pages are thrown away so there is probably a VM page priority
    adjustment that also needs to be made to retain the longer-cached
    meta-data in the face of multi-gigabyte directory tree scans.
    Something corecode is doing from cron is physically reading (not just
    stat()ing) a large number of files.

    I will make some adjustments to the VM page priority for meta-data
    returned by the buffer cache to the VM system as well as some
    adjustments to the vnode reclamation code to reduce instances of
    long-lived file vnodes getting blown out by read-once data.

						-Matt