In-pipeline instruction timing tests
Matthew Dillon
dillon at apollo.backplane.com
Mon Feb 9 10:18:04 PST 2004
Here are some basic instruction timing tests. My particular interest
is in the compare/jz/addl-to-mem test verses a cmpxchgl or locked
cmpxchgl. The compare/jz/addl-to-mem test simulates the new token
code overhead (minus the %fs load-from-memory which I cannot easily
simulate from userland), while a locked compare-exchange simulates
a mutex.
Note in particular that a cmp/jz/addl sequence seems to be far better
pipelined on both the AMD64 and a P4 then a cmpxchgl no matter which
way you turn it, and that *ANY* locked bus cycle instruction does really
horrible things to the cpu's pipeline.
2xP3 AMD64 1xP4
1.2GHz 3200+ 1.7GHz
cpu_add, addl to mem 1.535ns 0.194ns 0ns (1)
cpu_ladd, lock; addl to mem 37.50ns 7.869ns 69.660ns
cpu_call, 1 call/ret 3.934ns 1.921ns 3.550ns
cpu_cmpadd, cmp/jz/addl mem 4.027ns 0.583ns 0.765ns
cpu_cmpexg, cmpex 6.420ns 2.169ns 7.100ns
cpu_lcmpext, 42.84ns 7.479ns 72.35ns
note(1): addl to mem is completely absorbed or almost
completely absorbed by the cpu's pipeline in this test.
In anycase, this really solidifies my desire to avoid locked bus
cycle instructions.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list