[v4,2/2] ring: move the atomic load of head above the loop

Message ID 1541066031-29125-3-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series ring library with c11 memory model bug fix and optimization |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gavin Hu Nov. 1, 2018, 9:53 a.m. UTC
  In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.

This helps a little on the latency,about 1~5%.

 Test result with the patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.64
 MP/MC bulk enq/dequeue (size: 8): 9.58
 SP/SC bulk enq/dequeue (size: 32): 1.98
 MP/MC bulk enq/dequeue (size: 32): 2.30

Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 doc/guides/rel_notes/release_18_11.rst |  7 +++++++
 lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
 2 files changed, 11 insertions(+), 6 deletions(-)
  

Comments

Stephen Hemminger Nov. 1, 2018, 5:26 p.m. UTC | #1
On Thu,  1 Nov 2018 17:53:51 +0800
Gavin Hu <gavin.hu@arm.com> wrote:

> +* **Updated the ring library with C11 memory model.**
> +
> +  Updated the ring library with C11 memory model including the following changes:
> +
> +  * Synchronize the load and store of the tail
> +  * Move the atomic load of head above the loop
> +

Does this really need to be in the release notes? Is it a user visible change
or just an internal/optimization and fix.
  
Gavin Hu Nov. 2, 2018, 12:53 a.m. UTC | #2
Hi Stephen,

There is no api changes, but this is a significant change as ring is fundamental and widely used, it decreases latency by 25% in our tests, it may do even better for cases with more contending producers/consumers or deeper depth of rings.

Best regards
Gavin

Best regards
Gavin
  
Honnappa Nagarahalli Nov. 2, 2018, 4:30 a.m. UTC | #3
<Fixing this to make the reply inline, making email plain text>

 
On Thu, 1 Nov 2018 17:53:51 +0800 
Gavin Hu <mailto:gavin.hu@arm.com> wrote: 

> +* **Updated the ring library with C11 memory model.** 
> + 
> + Updated the ring library with C11 memory model including the following changes: 
> + 
> + * Synchronize the load and store of the tail 
> + * Move the atomic load of head above the loop 
> + 

Does this really need to be in the release notes? Is it a user visible change 
or just an internal/optimization and fix. 

[Gavin] There is no api changes, but this is a significant change as ring is fundamental and widely used, it decreases latency by 25% in our tests, it may do even better for cases with more contending producers/consumers or deeper depth of rings.

[Honnappa] I agree with Stephen. Release notes should be written from DPDK user perspective. In the rte_ring case, the user has the option of choosing between c11 and non-c11 algorithms. Performance would be one of the criteria to choose between these 2 algorithms. IMO, it probably makes sense to indicate that the performance of c11 based algorithm has been improved. However, I do not know what DPDK has followed historically regarding performance optimizations. I would prefer to follow whatever has been followed so far.
I do not think that we need to document the details of the internal changes since it does not help the user make a decision.
  
Gavin Hu Nov. 2, 2018, 7:15 a.m. UTC | #4
> -----Original Message-----
> From: Honnappa Nagarahalli
> Sent: Friday, November 2, 2018 12:31 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> stable@dpdk.org; nd <nd@arm.com>
> Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
> 
> <Fixing this to make the reply inline, making email plain text>
> 
> 
> On Thu, 1 Nov 2018 17:53:51 +0800
> Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> 
> > +* **Updated the ring library with C11 memory model.**
> > +
> > + Updated the ring library with C11 memory model including the following
> changes:
> > +
> > + * Synchronize the load and store of the tail
> > + * Move the atomic load of head above the loop
> > +
> 
> Does this really need to be in the release notes? Is it a user visible change or
> just an internal/optimization and fix.
> 
> [Gavin] There is no api changes, but this is a significant change as ring is
> fundamental and widely used, it decreases latency by 25% in our tests, it may
> do even better for cases with more contending producers/consumers or
> deeper depth of rings.
> 
> [Honnappa] I agree with Stephen. Release notes should be written from
> DPDK user perspective. In the rte_ring case, the user has the option of
> choosing between c11 and non-c11 algorithms. Performance would be one
> of the criteria to choose between these 2 algorithms. IMO, it probably makes
> sense to indicate that the performance of c11 based algorithm has been
> improved. However, I do not know what DPDK has followed historically
> regarding performance optimizations. I would prefer to follow whatever has
> been followed so far.
> I do not think that we need to document the details of the internal changes
> since it does not help the user make a decision.

I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes. 

https://doc.dpdk.org/guides-18.08/contributing/patches.html
section 5.5 says: 
Important changes will require an addition to the release notes in doc/guides/rel_notes/. 
See the Release Notes section of the Documentation Guidelines for details.
https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
"Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
New Features
Resolved Issues (see below)
Known Issues
API Changes
ABI Changes
Shared Library Versions
Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."

     Suggested order in release notes items:
     * Core libs (EAL, mempool, ring, mbuf, buses)
     * Device abstraction libs and PMDs
  
Thomas Monjalon Nov. 2, 2018, 9:36 a.m. UTC | #5
02/11/2018 08:15, Gavin Hu (Arm Technology China):
> 
> > -----Original Message-----
> > From: Honnappa Nagarahalli
> > Sent: Friday, November 2, 2018 12:31 PM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > Hemminger <stephen@networkplumber.org>
> > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > stable@dpdk.org; nd <nd@arm.com>
> > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> > loop
> > 
> > <Fixing this to make the reply inline, making email plain text>
> > 
> > 
> > On Thu, 1 Nov 2018 17:53:51 +0800
> > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> > 
> > > +* **Updated the ring library with C11 memory model.**
> > > +
> > > + Updated the ring library with C11 memory model including the following
> > changes:
> > > +
> > > + * Synchronize the load and store of the tail
> > > + * Move the atomic load of head above the loop
> > > +
> > 
> > Does this really need to be in the release notes? Is it a user visible change or
> > just an internal/optimization and fix.
> > 
> > [Gavin] There is no api changes, but this is a significant change as ring is
> > fundamental and widely used, it decreases latency by 25% in our tests, it may
> > do even better for cases with more contending producers/consumers or
> > deeper depth of rings.
> > 
> > [Honnappa] I agree with Stephen. Release notes should be written from
> > DPDK user perspective. In the rte_ring case, the user has the option of
> > choosing between c11 and non-c11 algorithms. Performance would be one
> > of the criteria to choose between these 2 algorithms. IMO, it probably makes
> > sense to indicate that the performance of c11 based algorithm has been
> > improved. However, I do not know what DPDK has followed historically
> > regarding performance optimizations. I would prefer to follow whatever has
> > been followed so far.
> > I do not think that we need to document the details of the internal changes
> > since it does not help the user make a decision.
> 
> I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
> In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes. 
> 
> https://doc.dpdk.org/guides-18.08/contributing/patches.html
> section 5.5 says: 
> Important changes will require an addition to the release notes in doc/guides/rel_notes/. 
> See the Release Notes section of the Documentation Guidelines for details.
> https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
> "Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
> New Features
> Resolved Issues (see below)
> Known Issues
> API Changes
> ABI Changes
> Shared Library Versions
> Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."
> 
>      Suggested order in release notes items:
>      * Core libs (EAL, mempool, ring, mbuf, buses)
>      * Device abstraction libs and PMDs

I agree with Honnappa.
You don't need to give details, but can explain that performance of
C11 version is improved.
  
Gavin Hu Nov. 2, 2018, 11:23 a.m. UTC | #6
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 2, 2018 5:37 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org;
> olivier.matz@6wind.com; chaozhu@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> jerin.jacob@caviumnetworks.com; stable@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
> 
> 02/11/2018 08:15, Gavin Hu (Arm Technology China):
> >
> > > -----Original Message-----
> > > From: Honnappa Nagarahalli
> > > Sent: Friday, November 2, 2018 12:31 PM
> > > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > > Hemminger <stephen@networkplumber.org>
> > > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > > stable@dpdk.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above
> > > the loop
> > >
> > > <Fixing this to make the reply inline, making email plain text>
> > >
> > >
> > > On Thu, 1 Nov 2018 17:53:51 +0800
> > > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> > >
> > > > +* **Updated the ring library with C11 memory model.**
> > > > +
> > > > + Updated the ring library with C11 memory model including the
> > > > + following
> > > changes:
> > > > +
> > > > + * Synchronize the load and store of the tail
> > > > + * Move the atomic load of head above the loop
> > > > +
> > >
> > > Does this really need to be in the release notes? Is it a user
> > > visible change or just an internal/optimization and fix.
> > >
> > > [Gavin] There is no api changes, but this is a significant change as
> > > ring is fundamental and widely used, it decreases latency by 25% in
> > > our tests, it may do even better for cases with more contending
> > > producers/consumers or deeper depth of rings.
> > >
> > > [Honnappa] I agree with Stephen. Release notes should be written
> > > from DPDK user perspective. In the rte_ring case, the user has the
> > > option of choosing between c11 and non-c11 algorithms. Performance
> > > would be one of the criteria to choose between these 2 algorithms.
> > > IMO, it probably makes sense to indicate that the performance of c11
> > > based algorithm has been improved. However, I do not know what DPDK
> > > has followed historically regarding performance optimizations. I
> > > would prefer to follow whatever has been followed so far.
> > > I do not think that we need to document the details of the internal
> > > changes since it does not help the user make a decision.
> >
> > I read through the online guidelines for release notes, besides API and new
> features, resolved issues which are significant and not newly introduced in
> this release cycle, should also be included.
> > In this case, the resolved issue existed for long time, across multiple
> release cycles and ring is a core lib, so it should be a candidate for release
> notes.
> >
> > https://doc.dpdk.org/guides-18.08/contributing/patches.html
> > section 5.5 says:
> > Important changes will require an addition to the release notes in
> doc/guides/rel_notes/.
> > See the Release Notes section of the Documentation Guidelines for details.
> > https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-
> > guidelines "Developers should include updates to the Release Notes
> > with patch sets that relate to any of the following sections:
> > New Features
> > Resolved Issues (see below)
> > Known Issues
> > API Changes
> > ABI Changes
> > Shared Library Versions
> > Resolved Issues should only include issues from previous releases that
> have been resolved in the current release. Issues that are introduced and
> then fixed within a release cycle do not have to be included here."
> >
> >      Suggested order in release notes items:
> >      * Core libs (EAL, mempool, ring, mbuf, buses)
> >      * Device abstraction libs and PMDs
> 
> I agree with Honnappa.
> You don't need to give details, but can explain that performance of
> C11 version is improved.
> 

V5 was submitted to indicate the improvement by the change, without giving more technical details, please have a review, thanks!
  

Patch

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 376128f..c9c2b86 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -69,6 +69,13 @@  New Features
   checked out against that dma mask and rejected if out of range. If more than
   one device has addressing limitations, the dma mask is the more restricted one.
 
+* **Updated the ring library with C11 memory model.**
+
+  Updated the ring library with C11 memory model including the following changes:
+
+  * Synchronize the load and store of the tail
+  * Move the atomic load of head above the loop
+
 * **Added hot-unplug handle mechanism.**
 
   ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 52da95a..7bc74a4 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -61,13 +61,11 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
+	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->prod.head,
-					__ATOMIC_ACQUIRE);
-
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -93,6 +91,7 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		if (is_sp)
 			r->prod.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head is updated */
 			success = __atomic_compare_exchange_n(&r->prod.head,
 					old_head, *new_head,
 					0, __ATOMIC_ACQUIRE,
@@ -135,13 +134,11 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	int success;
 
 	/* move cons.head atomically */
+	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->cons.head,
-					__ATOMIC_ACQUIRE);
-
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -166,6 +163,7 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		if (is_sc)
 			r->cons.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head will be updated */
 			success = __atomic_compare_exchange_n(&r->cons.head,
 							old_head, *new_head,
 							0, __ATOMIC_ACQUIRE,