[dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Jan 13 15:57:54 CET 2017


On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote:
> On 1/13/2017 8:17 AM, Jerin Jacob wrote:
> > On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
> >> Hi Jerin,
> >>
> >> On 1/12/2017 9:17 AM, Jerin Jacob wrote:
> >> <...>
> >>
> >>> +#include <rte_io.h>
> >>> +
> >>>  /* CSR write macro */
> >>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \
> >>> -	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
> >>> -			= (val)))
> >>> +#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
> >>> +	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
> >>
> >> For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
> >> which is indeed not a must, is this correct?
> > 
> > AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
> > someone thinks it needs to changed then I can fix it in following commit
> > in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.
> > 
> > Let me know.
> > 
> > AFAIK, Linux kernel code has a barrier in readl/writel for IA.
> > 
> > Typically we don't use any non relaxed versions in fast path.In fast
> > typically all the drivers has explicit write barrier for doorbell write
> > and followed by a relaxed version of write. IMO, In any event, it won't
> > generate performance regression.
> > 
> > [dpdk-master] $ git show
> > 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> > commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
> > Author: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > Date:   Mon Dec 5 06:36:49 2016 +0530
> > 
> >     eal/x86: define I/O device memory barriers for IA
> > 
> >     The patch does not provide any functional change for IA.
> >     I/O barriers are mapped to existing smp barriers.
> > 
> >     CC: Bruce Richardson <bruce.richardson at intel.com>
> >     CC: Konstantin Ananyev <konstantin.ananyev at intel.com>
> >     Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > 
> > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> > b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> > index 00b1cdf..4eac666 100644
> > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> > @@ -61,6 +61,12 @@ extern "C" {
> >  
> >  #define rte_smp_rmb() rte_compiler_barrier()
> >  
> > +#define rte_io_mb() rte_mb()
> > +
> > +#define rte_io_wmb() rte_compiler_barrier()
> > +
> > +#define rte_io_rmb() rte_compiler_barrier()
> > +
> >  /*------------------------- 16 bit atomic operations
> >  * -------------------------*/
> >  
> >  #ifndef RTE_FORCE_INTRINSICS
> > 
> >>
> >> If so, does it make sense to override these functions for x86, and make
> >> rte_writeX = rte_writeX_relaxed
> >> rte_readX = rte_readX_relaxed
> >>
> >>>  
> >>>  /* CSR read macro */
> >>> -#define ADF_CSR_RD(csrAddr, csrOffset) \
> >>> -	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
> >>> +#define ADF_CSR_RD(csrAddr, csrOffset)			\
> >>> +	rte_read32((((uint8_t *)csrAddr) + csrOffset))
> >>
> >> This patchset both introduces new rte_readX/rte_writeX functions, also
> >> applies them into drivers.
> >>
> >> While applying them, it changes the behavior.
> >> Like above code was doing a read, but after update it does read and
> >> read_memory_barrier.
> >>
> >> What do you think this patchset updates usage in a manner that keeps
> >> behavior exact same. Like using rte_read32_relaxed for this case.
> >> And doing architecture related updates in a different patchset?
> > 
> > Need to use rte_read32 at this commit otherwise it will break for ARM.
> > That's was all point for this patchset.
> 
> Why it breaks the ARM, is it because rte_*mb() updated for ARM in this
> patchset (patch 7/29) ?

Yes.


> 
> I believe it is good to make these modifications in two phase:

It is in two phases only. First introduced the API with implementation and
enabled in each driver. Why did you think other-way around it is better?
I can rework and test if there is any value addition. If you concerned
about git bisect ability then I don't think we are loosing that in this
model.

Thoughts?

> - First replace old usage with rte_readX/rte_writeX while keeping exact
> same behavior
> 
> - Second, do architecture specific changes. Both in eal and drivers
> level if required.
> 
> Thanks,
> ferruh
> 
> > For performance regression, we can always verify by taking delta
> > between this changeset and the previous changeset. If you think, I need
> > to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
> > as well.
> > 
> > 
> >>
> >> This both makes easy to see architecture specific updates, and makes
> >> easy to trace any possible performance issues by this patchset.
> >>
> >>>  
> >>>  #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
> >>>  #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
> >>>
> >>
> 


More information about the dev mailing list