[v2,2/2] ring: move the atomic load of head above the loop

Message ID 1540956945-211373-3-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series rte ring c11 bug fix and optimization |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gavin Hu Oct. 31, 2018, 3:35 a.m. UTC
  In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.

This helps a little on the latency,about 1~5%.

 Test result with the patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.64
 MP/MC bulk enq/dequeue (size: 8): 9.58
 SP/SC bulk enq/dequeue (size: 32): 1.98
 MP/MC bulk enq/dequeue (size: 32): 2.30

Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
Reviewed-by: Jia He <justin.he@arm.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 doc/guides/rel_notes/release_18_11.rst |  7 +++++++
 lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
 2 files changed, 11 insertions(+), 6 deletions(-)
  

Comments

Thomas Monjalon Oct. 31, 2018, 9:36 a.m. UTC | #1
31/10/2018 04:35, Gavin Hu:
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> +* **Updated rte ring C11 driver.**
> +
> +  Updated the rte ring C11 driver including the following changes:

It is not a driver, it is the ring library with C11 memory model.
Please reword and move it after memory related notes (at the beginning).
Thanks
  
Gavin Hu Oct. 31, 2018, 10:27 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, October 31, 2018 5:37 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com; chaozhu@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> jerin.jacob@caviumnetworks.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v2 2/2] ring: move the atomic load of head above the
> loop
>
> 31/10/2018 04:35, Gavin Hu:
> > --- a/doc/guides/rel_notes/release_18_11.rst
> > +++ b/doc/guides/rel_notes/release_18_11.rst
> > +* **Updated rte ring C11 driver.**
> > +
> > +  Updated the rte ring C11 driver including the following changes:
>
> It is not a driver, it is the ring library with C11 memory model.
> Please reword and move it after memory related notes (at the beginning).
> Thanks
>

Done, v3 just submitted, thanks!

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  

Patch

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 376128f..3bbbe2a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -158,6 +158,13 @@  New Features
   * Support for runtime Rx and Tx queues setup.
   * Support multicast MAC address set.
 
+* **Updated rte ring C11 driver.**
+
+  Updated the rte ring C11 driver including the following changes:
+
+  * Synchronize the load and store of the tail
+  * Move the atomic load of head above the loop
+
 * **Added a devarg to use PCAP interface physical MAC address.**
   A new devarg ``phy_mac`` was introduced to allow users to use physical
   MAC address of the selected PCAP interface.
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 4851763..fdab2b9 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -60,13 +60,11 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
+	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->prod.head,
-					__ATOMIC_ACQUIRE);
-
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -92,6 +90,7 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		if (is_sp)
 			r->prod.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head is updated */
 			success = __atomic_compare_exchange_n(&r->prod.head,
 					old_head, *new_head,
 					0, __ATOMIC_ACQUIRE,
@@ -133,13 +132,11 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	int success;
 
 	/* move cons.head atomically */
+	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->cons.head,
-					__ATOMIC_ACQUIRE);
-
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -164,6 +161,7 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		if (is_sc)
 			r->cons.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head will be updated */
 			success = __atomic_compare_exchange_n(&r->cons.head,
 							old_head, *new_head,
 							0, __ATOMIC_ACQUIRE,