[dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API

Ferruh Yigit ferruh.yigit at intel.com
Fri Jan 13 12:32:29 CET 2017


On 1/13/2017 8:17 AM, Jerin Jacob wrote:
> On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
>> Hi Jerin,
>>
>> On 1/12/2017 9:17 AM, Jerin Jacob wrote:
>> <...>
>>
>>> +#include <rte_io.h>
>>> +
>>>  /* CSR write macro */
>>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \
>>> -	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
>>> -			= (val)))
>>> +#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
>>> +	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
>>
>> For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
>> which is indeed not a must, is this correct?
> 
> AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
> someone thinks it needs to changed then I can fix it in following commit
> in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.
> 
> Let me know.
> 
> AFAIK, Linux kernel code has a barrier in readl/writel for IA.
> 
> Typically we don't use any non relaxed versions in fast path.In fast
> typically all the drivers has explicit write barrier for doorbell write
> and followed by a relaxed version of write. IMO, In any event, it won't
> generate performance regression.
> 
> [dpdk-master] $ git show
> 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> Author: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Date:   Mon Dec 5 06:36:49 2016 +0530
> 
>     eal/x86: define I/O device memory barriers for IA
> 
>     The patch does not provide any functional change for IA.
>     I/O barriers are mapped to existing smp barriers.
> 
>     CC: Bruce Richardson <bruce.richardson at intel.com>
>     CC: Konstantin Ananyev <konstantin.ananyev at intel.com>
>     Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> index 00b1cdf..4eac666 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> @@ -61,6 +61,12 @@ extern "C" {
>  
>  #define rte_smp_rmb() rte_compiler_barrier()
>  
> +#define rte_io_mb() rte_mb()
> +
> +#define rte_io_wmb() rte_compiler_barrier()
> +
> +#define rte_io_rmb() rte_compiler_barrier()
> +
>  /*------------------------- 16 bit atomic operations
>  * -------------------------*/
>  
>  #ifndef RTE_FORCE_INTRINSICS
> 
>>
>> If so, does it make sense to override these functions for x86, and make
>> rte_writeX = rte_writeX_relaxed
>> rte_readX = rte_readX_relaxed
>>
>>>  
>>>  /* CSR read macro */
>>> -#define ADF_CSR_RD(csrAddr, csrOffset) \
>>> -	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
>>> +#define ADF_CSR_RD(csrAddr, csrOffset)			\
>>> +	rte_read32((((uint8_t *)csrAddr) + csrOffset))
>>
>> This patchset both introduces new rte_readX/rte_writeX functions, also
>> applies them into drivers.
>>
>> While applying them, it changes the behavior.
>> Like above code was doing a read, but after update it does read and
>> read_memory_barrier.
>>
>> What do you think this patchset updates usage in a manner that keeps
>> behavior exact same. Like using rte_read32_relaxed for this case.
>> And doing architecture related updates in a different patchset?
> 
> Need to use rte_read32 at this commit otherwise it will break for ARM.
> That's was all point for this patchset.

Why it breaks the ARM, is it because rte_*mb() updated for ARM in this
patchset (patch 7/29) ?

I believe it is good to make these modifications in two phase:
- First replace old usage with rte_readX/rte_writeX while keeping exact
same behavior

- Second, do architecture specific changes. Both in eal and drivers
level if required.

Thanks,
ferruh

> For performance regression, we can always verify by taking delta
> between this changeset and the previous changeset. If you think, I need
> to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
> as well.
> 
> 
>>
>> This both makes easy to see architecture specific updates, and makes
>> easy to trace any possible performance issues by this patchset.
>>
>>>  
>>>  #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
>>>  #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
>>>
>>



More information about the dev mailing list