i386 version of cpu_sfence()

Fri Jan 28 10:58:06 PST 2011

:Hi all,
:
:i386 version of cpu_sfence(), it is just asm volatile ("" :::"memory")
:
:According to the instruction set, sfence should also ensures that the
:"global visibility" (i.e. empty CPU store buffer) of the stores before
:sfence.
:So should we do the same as cpu_mfence(), i.e. use a locked memory access?
:
:Best Regards,
:sephe

    cpu_sfence() is basically a NOP, because x86 cpus already order
    writes for global visibility.  The volatile ..."memory" macro is
    roughly equivalent to cpu_ccfence() ... prevent the compiler itself
    from trying to optimize or reorder actual instructions around that
    point in the code.

    lfence/mfence require actual work.  Even though cpus guarantee
    ordered global visibility on write they also reorder reads and
    do speculative reads.  Thus lfence/mfence are real.

    I dunno what the best approach is but I suggest someone modify
    one of the benchmark tests in /usr/src/test/sysperf to measure
    the cost of lfence/mfence vs locked memory instructions.  For
    such a test to be reasonable the test must create two threads
    which compete for the memory location in question.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>