[dpdk-dev] [PATCH] atomic: clarify use of memory barriers
Ananyev, Konstantin
konstantin.ananyev at intel.com
Mon May 26 15:57:25 CEST 2014
Hi Oliver,
>> So with the following fragment of code:
>> extern int *x;
>> extern __128i a, *p;
>> L0:
>> _mm_stream_si128( p, a);
>> rte_compiler_barrier();
>> L1:
>> *x = 0;
>>
>> There is no guarantee that store at L0 will always be finished
>> before store at L1.
>This code fragment looks very similar to what is done in
>__rte_ring_sp_do_enqueue():
>
> [...]
> ENQUEUE_PTRS(); /* I expect it is converted to an SSE store */
> rte_compiler_barrier();
> [...]
> r->prod.tail = prod_next;
>So, according to your previous explanation, I understand that
>this code would require a write memory barrier in place of the
>compiler barrier. Am I wrong?
No, right now compiler barrier is enough here.
ENQUEUE_PTRS() doesn't use Non-Temporal stores (MOVNT*), so write order should be guaranteed.
Though, if in future we'll change ENQUEUE_PTRS() to use non-tempral stores, we'll have to use sfence(or mfence).
>Moreover, if I understand well, a real wmb() is needed only if
>a SSE store is issued. But the programmer may not control that,
>it's the job of the compiler.
'Normal' SIMD writes are not reordered.
So it is ok for the compiler to use them if appropriate.
> > But now, there seems a confusion: everyone has to remember that
>> smp_mb() and smp_wmb() are 'real' fences, while smp_rmb() is not.
>> That's why my suggestion was to simply keep using compiler_barrier()
>> for all cases, when we don't need real fence.
>I'm not sure the programmer has to know which smp_*mb() is a real fence
>or not. He just expects that it generates the proper CPU instructions
>that guarantees the effectiveness of the memory barrier.
In most cases just a compiler barrier is enough, but there are few exceptions.
Always using fence instructions - means introduce unnecessary slowdown for cases, when order is guaranteed.
No using fences in cases, when they are needed - means introduce race window and possible data corruption.
That's why right now people can use either rte_compiler_barrier() or mb/rmb/wmb - whatever is appropriate for particular case.
Konstantin
More information about the dev
mailing list