[dpdk-dev] [PATCH] atomic: clarify use of memory barriers

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon May 26 15:57:25 CEST 2014


Hi Oliver,

>> So with the following fragment of code:
>> extern int *x;
>> extern  __128i a, *p;
>> L0:
>> _mm_stream_si128( p, a);
>> rte_compiler_barrier();
>> L1:
>> *x = 0;
>>
>> There is no guarantee that store at L0 will always be finished
>> before store at L1.

>This code fragment looks very similar to what is done in
>__rte_ring_sp_do_enqueue():
>
>    [...]
>     ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */
>     rte_compiler_barrier();
>     [...]
>     r->prod.tail = prod_next;
>So, according to your previous explanation, I understand that
>this code would require a write memory barrier in place of the
>compiler barrier. Am I wrong?

No, right now compiler barrier is enough here.
ENQUEUE_PTRS() doesn't use Non-Temporal stores (MOVNT*), so write order should be guaranteed.
Though, if in future we'll change ENQUEUE_PTRS() to use  non-tempral stores, we'll have to use sfence(or mfence). 

>Moreover, if I understand well, a real wmb() is needed only if
>a SSE store is issued. But the programmer may not control that,
>it's the job of the compiler.

'Normal' SIMD writes are not reordered.
So it is ok for the compiler to use them if appropriate.  

> > But now, there seems a confusion: everyone has to remember that
>> smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not.
>> That's why my suggestion was to simply keep using compiler_barrier()
>> for all cases, when we don't need real fence.

>I'm not sure the programmer has to know which smp_*mb() is a real fence
>or not. He just expects that it generates the proper CPU instructions
>that guarantees the effectiveness of the memory barrier.

In most cases just a compiler barrier is enough, but there are few exceptions.
Always using fence instructions -  means introduce unnecessary slowdown for cases, when order is guaranteed.
No using fences in cases, when they are needed - means introduce race window and possible data corruption.
That's why right now people can use either rte_compiler_barrier() or mb/rmb/wmb - whatever is appropriate for particular case.

Konstantin


More information about the dev mailing list