PostgreSQL 9.3 benchmarks

Mon Mar 10 17:13:31 PDT 2014

    The likely main culprit is VM faults.  Since postgres service processes
    fork rather than thread, and use mmap rather than sysv shm, and each
    one is accessing the same multi-gigabyte anonymous memory for the
    database cache (mmap'd), there is a multiplication of VM faults which
    occurs.  CacheSize x #Service-processes.

    For FreeBSD to be efficient here it would need to use 2MB pages for the
    whole cache.  I don't know the parameters that need to set to ensure
    that that is the case.  DragonFly doesn't have the problem when
    machdep.pmap_mmu_optimize is turned on (which it is for these tests),
    even though DFly uses 4KB pages, because the kernel will share page
    table pages (even across fork()s) for compatible mappings.

    The second issue is the fact that the VM faults are all occuring on the
    same VM object and, in fact, often on the same vm_page (just from umpteen
    different discrete processes).  SMP contention will be horrendous if
    the fault path isn't optimized to deal with the case.  DragonFly
    optimizes the VM fault path.

    A third possible issue is process scheduling.  Basically there are
    N clients and N servers (each server backed by postgres's N-GB shared
    memory cache).  Each client:server pair is running heavy IPC between
    them.  The scheduler has to be aware of the cpu thread/cache topology
    to optimally schedule all the processes to minimize off-chip cache
    bus traffic.  This artifact can be seen in the linux vs DFly graphs.
    The performance curve difference you see there is 100% scheduler related.
    As best as I could determine, the linux scheduler was making too many
    trade-offs in order to get the flat right-most side of the graph.  DFly
    is able to stay ahead of it most of the time by not making those
    trade-offs, but of course that means we fall-off a little harder at
    the end.

    So in terms of this benchmark, what we need is a FreeBSD expert to tell
    us how to tune FreeBSD.  Primarily how to ensure that 2MB pages are being
    used.  Success can be checked by observing VM fault rate in systat -vm 1
    output during the test.  Even so there are going to be issues.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>