VM idle page zeroing

Mon May 17 07:58:48 PDT 2010

:Hi,
:
:On a 2.4 GHz P4 with 2GB of RAM - buildkernel took 1048 sec on a kernel 
:immediately preceding the idlezero patch; on a kernel with the idlezero 
:patch but with it disabled, 1054 sec; with idlezero on, 1051 sec; with
:idlezero and nocache on, 1052 sec. So as to whether it improved 
:performance, 'too close to call'.
:
:In the non-idlezero and idlezero runs, there was ~6.5M zero-fill faults; 
:less than 1% were ozfod (found a zero page available) in the former, 
:approximately 40% in the latter. At various points during the build in the 
:idlezero case, we just ran out of zero pages and it would be some time 
:before they were restored.
:
:Not sure what to make of this.
:-- vs

    You can calculate the actual time.  Compile and run the mbwtest
    program from /usr/src/test/sysperf/mbwtest.c and use the non-cache
    bandwidth, then multiply by 6.5m x 4k.  On my test box non-cache
    bandwidth is 4672 MB/sec, so:

	6.5m x 4k / 4672 mb/sec = 5.6 seconds.

    If only 40% of those are pre-zerod then the difference will only
    be 2.8 seconds best case.  And that's only if the execution of the
    build is completely serialized and baring other issues.

    If pre-zeroing makes a difference anywhere it will be with serialized
    programs... programs which do not use parallelism and take a lot of
    faults.  Shell scripts might be one example.  Perhaps a pkgsrc build.
    Perhaps application startup (like maybe firefox).  Regardless it would
    take a considerable load to see anything noticeable.

    Another thing to note is that an inline zfod fault zeros the page
    through the cache, meaning the bytes in the page will already be
    'hot' on return from the fault.  Since the program is about to access
    the page anyway a lot of this overhead winds up being useful to the
    program as the page will already be in the cpu cache.

    When the program uses a pre-zerod page instead the contents of the
    page will not be 'hot' on return from the fault and the program itself
    will have to load the data from ram into the cpu cache.  Not only
    that but it actually has to issue memory reads to the ram, whereas in
    the inline zeroing case the zeroing operation itself which 'hot's the
    cache issues only writes.

    This eats away at the advantages of pre-zeroing as well.

    I do think there are still going to be cases where pre-zeroing does
    in fact help, and it certainly doesn't hurt, so it is probably worth
    running it at a low or medium burn rate.  You can mess around with the
    parameters to remove as much of the downside as possible.

    pre-zeroing might also do much, much better on machines with very
    low memory bandwidths, such as (possibly) netbooks.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>