[dpdk-stable] [PATCH v5 2/2] ring: move the atomic load of head above the loop

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Sat Nov 3 10:34:36 CET 2018


> > > ---
> > >  doc/guides/rel_notes/release_18_11.rst |  7 +++++++
> > >  lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
> > >  2 files changed, 11 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/doc/guides/rel_notes/release_18_11.rst
> > > b/doc/guides/rel_notes/release_18_11.rst
> > > index 376128f..b68afab 100644
> > > --- a/doc/guides/rel_notes/release_18_11.rst
> > > +++ b/doc/guides/rel_notes/release_18_11.rst
> > > @@ -69,6 +69,13 @@ New Features
> > >    checked out against that dma mask and rejected if out of range.
> > > If more
> > than
> > >    one device has addressing limitations, the dma mask is the more
> > restricted one.
> > >
> > > +* **Updated the ring library with C11 memory model.**
> > > +
> > > +  Updated the ring library with C11 memory model, in our tests the
> > > + changes  decreased latency by 27~29% and 3~15% for MPMC and SPSC
> > cases respectively.
> > > +  The real improvements may vary with the number of contending
> > > + lcores and the  size of ring.
> > > +
> > Is this a little misleading, and will users expect massive performance
> > improvements generally? The C11 model seems to be used only on some,
> > but not all, arm platforms, and then only with "make" builds.
> >
> > config/arm/meson.build: ['RTE_USE_C11_MEM_MODEL', false]]
This is an error. There is already an agreement that on Arm based platforms, C11 memory model would be used by default. Specific platforms can override it if required.
Would this be ab acceptable change for RC2 or RC3?

> > config/common_armv8a_linuxapp:CONFIG_RTE_USE_C11_MEM_MODEL=y
> > config/common_base:CONFIG_RTE_USE_C11_MEM_MODEL=n
> > config/defconfig_arm64-thunderx-linuxapp-
> > gcc:CONFIG_RTE_USE_C11_MEM_MODEL=n
> >
> > /Bruce
> 
> Thank you Bruce for the review, to limit the scope of improvement, I rewrite
> the note as follows, could you help review? Feel free to change anything if you
> like.
> " Updated the ring library with C11 memory model, running
> ring_perf_autotest on Cavium ThunderX2 platform, the changes  decreased
> latency by 27~29% and 3~15% for MPMC and SPSC cases (2 lcores)
> respectively. Note the changes help the relaxed memory ordering
> architectures (arm, ppc) only when CONFIG_RTE_USE_C11_MEM_MODEL=y
> was configured, no impact on strong memory ordering architectures like x86.
> To what extent they help the real use cases depends on other factors, like the
> number of contending readers/writers, size of the ring, whether or not it is on
> the critical path."
> 
> /Gavin
IMO, mentioning the performance numbers requires mentioning system configurations. I suggest we keep this somewhat vague (which will make the users to run the test on their specific platform) and simple. Can I suggest the following:

"C11 memory model algorithm for ring library is updated. This results in improved performance on some Arm based platforms."


More information about the stable mailing list