[v2,3/3] test/ring_perf: replace sync builtins with atomic builtins

Message ID 1553856998-25394-4-git-send-email-phil.yang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series example and test cases optimizations |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Phil Yang March 29, 2019, 10:56 a.m. UTC
  '__sync' built-in functions are deprecated, should use the '__atomic'
built-in instead. the sync built-in functions are full barriers, while
atomic built-in functions offer less restrictive one-way barriers,
which help performance.

Here is the example test result on TX2:
sudo ./arm64-armv8a-linuxapp-gcc/app/test -c 0x7fffffe \
-n 4 --socket-mem=1024,0 --file-prefix=~ -- -i
RTE>>ring_perf_autotest

*** ring_perf_autotest without this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.22
MP/MC bulk enq/dequeue (size: 8): 11.50
SP/SC bulk enq/dequeue (size: 32): 1.85
MP/MC bulk enq/dequeue (size: 32): 2.66

*** ring_perf_autotest with this patch ***
SP/SC bulk enq/dequeue (size: 8): 6.13
MP/MC bulk enq/dequeue (size: 8): 9.83
SP/SC bulk enq/dequeue (size: 32): 1.96
MP/MC bulk enq/dequeue (size: 32): 2.30

So for the ring performance test, this patch improved 11% of ring
operations performance.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 app/test/test_ring_perf.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
  

Comments

Honnappa Nagarahalli April 1, 2019, 4:24 p.m. UTC | #1
<snip>

> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index
> ebb3939..e851c1a 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -160,7 +160,11 @@ enqueue_bulk(void *p)
>  	unsigned i;
>  	void *burst[MAX_BURST] = {0};
> 
> -	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
> +#ifdef RTE_USE_C11_MEM_MODEL
> +	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
> #else
> +	if (__sync_add_and_fetch(&lcore_count, 1) != 2) #endif
>  		while(lcore_count != 2)
>  			rte_pause();
Since, rte_ring library has both C11 and non-C11 implementations, conditional compilation should be fine here.

> 
> @@ -196,7 +200,11 @@ dequeue_bulk(void *p)
>  	unsigned i;
>  	void *burst[MAX_BURST] = {0};
> 
> -	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
> +#ifdef RTE_USE_C11_MEM_MODEL
> +	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
> #else
> +	if (__sync_add_and_fetch(&lcore_count, 1) != 2) #endif
>  		while(lcore_count != 2)
>  			rte_pause();
> 
> --
> 2.7.4
  

Patch

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index ebb3939..e851c1a 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -160,7 +160,11 @@  enqueue_bulk(void *p)
 	unsigned i;
 	void *burst[MAX_BURST] = {0};
 
-	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
 		while(lcore_count != 2)
 			rte_pause();
 
@@ -196,7 +200,11 @@  dequeue_bulk(void *p)
 	unsigned i;
 	void *burst[MAX_BURST] = {0};
 
-	if ( __sync_add_and_fetch(&lcore_count, 1) != 2 )
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
 		while(lcore_count != 2)
 			rte_pause();