In-pipeline instruction timing tests
    Matthew Dillon 
    dillon at apollo.backplane.com
       
    Mon Feb  9 10:18:04 PST 2004
    
    
  
    Here are some basic instruction timing tests.  My particular interest
    is in the compare/jz/addl-to-mem test verses a cmpxchgl or locked
    cmpxchgl.   The compare/jz/addl-to-mem test simulates the new token
    code overhead (minus the %fs load-from-memory which I cannot easily
    simulate from userland), while a locked compare-exchange simulates 
    a mutex.
    Note in particular that a cmp/jz/addl sequence seems to be far better
    pipelined on both the AMD64 and a P4 then a cmpxchgl no matter which
    way you turn it, and that *ANY* locked bus cycle instruction does really
    horrible things to the cpu's pipeline.
				2xP3	AMD64	1xP4
				1.2GHz	3200+	1.7GHz
cpu_add, addl to mem		1.535ns	0.194ns	0ns (1)
cpu_ladd, lock; addl to mem	37.50ns 7.869ns 69.660ns
cpu_call, 1 call/ret		3.934ns 1.921ns 3.550ns
cpu_cmpadd, cmp/jz/addl	mem	4.027ns	0.583ns 0.765ns
cpu_cmpexg, cmpex		6.420ns	2.169ns	7.100ns
cpu_lcmpext,			42.84ns 7.479ns 72.35ns
	note(1): addl to mem is completely absorbed or almost
	completely absorbed by the cpu's pipeline in this test.
    In anycase, this really solidifies my desire to avoid locked bus
    cycle instructions.
						-Matt
						Matthew Dillon
						<dillon at xxxxxxxxxxxxx>
    
    
More information about the Kernel
mailing list