[dpdk-dev] [PATCH v2] ring: enforce reading the tails before ring operations

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Fri Mar 8 06:06:12 CET 2019


> > > >> On 07.03.2019 9:45, gavin hu wrote:
> > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail
> > > >>> may
> > get
> > > >>> reordered after reading or writing the ring slots, which
> > > >>> corrupts the
> > ring
> > > >>> and stale data is observed.
> > > >>>
> > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem
> > is
> > > >> most
> > > >>> likely caused by missing the acquire semantics when reading
> > > >>> cons.tail
> > (in
> > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible
> > > >>> to
> > > read
> > > >> a
> > > >>> stale value from the ring slots.
> > > >>>
> > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the
> > > >> required
> > > >>> ordering. This patch is to prevent reading and writing the ring
> > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC)
> case.
> > > >>
> > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get
> > > >> reordered before reading the tail. However, to prevent *writing*
> > > >> the ring get reordered *before reading* the tail you need a full
> > > >> memory barrier, i.e.
> > > >> rte_smp_mb().
> > > >
> > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while
> > WMB(ST
> > > Option) orders ST/ST.
> > > > For more details, please refer to: Table B2-1 Encoding of the DMB
> > > > and
> > DSB
> > > <option> parameter  in
> > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> > > reference-manual-armv8-for-armv8-a-architecture-profile
> > >
> > > I see. But you have to change the rte_smp_rmb() function definition
> > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure
> > > that all other architectures follows same rules.
> > > Otherwise, this change is logically wrong, because read barrier in
> > > current definition could not be used to order Load with Store.
> > >
> >
> > Good points, let me re-think how to handle for other architectures.
> > Full MB is required for other architectures(x86? Ppc?), but for arm,
> > read barrier(load/store and load/load) is enough.
> 
> Hi Ilya,
> 
> I would expand the rmb definition to cover load/store, in addition to
> load/load.
> For X86, as a strong memory order model, rmb is actually equivalent to mb,
> as implemented as a compiler barrier: rte_compiler_barrier, arm32 is also
> this case.
> For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders load/store,
> load/load, store/load, store/store, looking at the table on this page:
> https://www.ibm.com/developerworks/systems/articles/powerpc.html
> 
> In summary, we are safe to expand this definition for all the architectures
> DPDK support?
Essentially, it is a documentation bug. i.e. the current implementation of rte_smp_rmb() already behaves as load/load and load/store barrier.

> Any comments are welcome!
> 
> BR. Gavin
> 

<snip>


More information about the dev mailing list