[dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

Wang Dong dong.wang.pro at hotmail.com
Sat May 9 12:24:26 CEST 2015


Hi Konstantin,

>
> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro at hotmail.com]
>> Sent: Thursday, May 07, 2015 4:28 PM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of WangDong
>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>> To: dev at dpdk.org
>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>> compiler
>>>> memory barrier is enough.
>>>
>>> I wouldn't say they are 'unnecessary'.
>>> There are situations, even on IA, when you need _fence_ isntructions.
>>> So, please leave rte_*mb() macros unmodified.
>> OK, leave them unmodified, but I really can't find a situation to use
>> sfence and lfence instructions.
>
> For example:
> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>
>>
>>
>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>> Though if you have some better name in mind, I am open to suggestions here.
>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>
> Hmm, but why _dma_?
> We need same thing for multi-core communication too.
> If rte_smp_ is not good enough, might be: rte_arch_?
I want these two macro only used in PMD, so I think _dma_ is better. The 
memory barrier of processor-processor maybe more complex, and I'm not 
familiar with it... Someone can add rte_smp_*mb for multi-core.

I think _arch_ is means nothing here, because rte_*mb is already for 
architectures that dpdk supported, they are redefined in these architecture.

>
>>
>>>
>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>
>>> As far as I remember, amd has the same memory ordering model.
>> It's too hard to find a AMD's software developer manual.....
>
> There for example:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
> ?
Search such document on AMD offical website for a long time, this manual 
is what I want, thanks very much!!!

Dong

>
> Konstantin
>
>>
>> Dong
>>
>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>
>>> Konstantin
>>>
>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>> performance
>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add
>> the
>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>> rte_rmb()
>>>> and rte_wmb() for any architecture.
>>>>
>>>> ---
>>>>    lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> index e93e8ee..52b1e81 100644
>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>
>>>>    #define	rte_mb() _mm_mfence()
>>>>
>>>> +#ifdef RTE_ARCH_X86_IA
>>>> +
>>>> +#define rte_wmb() rte_compiler_barrier()
>>>> +
>>>> +#define rte_rmb() rte_compiler_barrier()
>>>> +
>>>> +#else
>>>> +
>>>>    #define	rte_wmb() _mm_sfence()
>>>>
>>>>    #define	rte_rmb() _mm_lfence()
>>>>
>>>> +#endif
>>>> +
>>>>    /*------------------------- 16 bit atomic operations -------------------------*/
>>>>
>>>>    #ifndef RTE_FORCE_INTRINSICS
>>>> --
>>>> 1.9.1
>>>


More information about the dev mailing list