hammer_alloc_data panic

Wed Jul 2 16:21:48 PDT 2008

:Actually it seems to occur on every reblock.
:
:-- 
:Gergo Szakal MD <bastyaelvtars at gmail.com>
:University Of Szeged, HU
:Faculty Of General Medicine
:
:/* Please do not CC me with replies, thank you. */

(kgdb) print *errorp
$1 = 28	(ENOSPC)

    Ok, it is failing on ENOSPC.  But the root volume is reporting
    158 big blocks free (1264 MB free).  The reason it is a panic is
    that there is supposed to be a higher-level check for available
    free space BEFORE the actual allocation is attempted.  That clearly
    is not happening in the reblocker.

(kgdb) frame 12
(kgdb) print trans->rootvol->ondisk->vol0_stat_bigblocks
$5 = 1758
(kgdb) print trans->rootvol->ondisk->vol0_stat_freebigblocks
$6 = 158

    I think I may know what is going on.  Does the reblocker always seem
    to start working ok, run for a little while, but then panic ?  Or does
    it panic immediately?

    Second question... if you do this, how big a file can you create before
    the filesystem runs out of space (or crashes)?

	dd if=/dev/zero of=<some_new_file_on_hammer> bs=32k

    If the dd is able to create a large file (> 500MB) then I have a pretty
    good idea what the problem is, and it should be easy to fix.  It is
    either the free space estimator is being broken, or the reblocker is
    building up a lot of reserved big-blocks during its operation without
    flushing.

    I'll explain that last bit.  When the data associated with a record is
    freed, the space can't actually be reused for *TWO* flush cycles
    (60 seconds, approximately).  The reason it can't be immediately reused
    is because no UNDOs are generated for data, only for meta-data.  We
    don't want the crash recovery code to re-associate data that was 
    previously deleted, but which may have also gotten overwritten.

    The reblocker is constantly allocating new space and freeing the old
    space.  I'll bet the problem is simply that the reblocker is able to
    reallocate 1264MB worth of data without building up enough meta-data
    changes to force two flush cycles to actually free up the areas it
    had deallocated.

    I'm hoping that's the problem.  If it is then the solution is simple.
    I just need to track the delayed-free space and also do a space check
    in the reblocker loop and abort it.

    I'll work on a fix today.  I want HAMMER to be graceful when disk space
    is low.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>