PostgreSQL 9.3 benchmarks
Matthew Dillon
dillon at apollo.backplane.com
Mon Mar 10 17:13:31 PDT 2014
The likely main culprit is VM faults. Since postgres service processes
fork rather than thread, and use mmap rather than sysv shm, and each
one is accessing the same multi-gigabyte anonymous memory for the
database cache (mmap'd), there is a multiplication of VM faults which
occurs. CacheSize x #Service-processes.
For FreeBSD to be efficient here it would need to use 2MB pages for the
whole cache. I don't know the parameters that need to set to ensure
that that is the case. DragonFly doesn't have the problem when
machdep.pmap_mmu_optimize is turned on (which it is for these tests),
even though DFly uses 4KB pages, because the kernel will share page
table pages (even across fork()s) for compatible mappings.
The second issue is the fact that the VM faults are all occuring on the
same VM object and, in fact, often on the same vm_page (just from umpteen
different discrete processes). SMP contention will be horrendous if
the fault path isn't optimized to deal with the case. DragonFly
optimizes the VM fault path.
A third possible issue is process scheduling. Basically there are
N clients and N servers (each server backed by postgres's N-GB shared
memory cache). Each client:server pair is running heavy IPC between
them. The scheduler has to be aware of the cpu thread/cache topology
to optimally schedule all the processes to minimize off-chip cache
bus traffic. This artifact can be seen in the linux vs DFly graphs.
The performance curve difference you see there is 100% scheduler related.
As best as I could determine, the linux scheduler was making too many
trade-offs in order to get the flat right-most side of the graph. DFly
is able to stay ahead of it most of the time by not making those
trade-offs, but of course that means we fall-off a little harder at
the end.
So in terms of this benchmark, what we need is a FreeBSD expert to tell
us how to tune FreeBSD. Primarily how to ensure that 2MB pages are being
used. Success can be checked by observing VM fault rate in systat -vm 1
output during the test. Even so there are going to be issues.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Users
mailing list