[dpdk-stable] patch 'spinlock: reimplement with atomic one-way barrier' has been queued to LTS release 18.11.2

Kevin Traynor ktraynor at redhat.com
Tue Apr 16 16:37:16 CEST 2019

Previous message: [dpdk-stable] patch 'test/spinlock: amortize the cost of getting time' has been queued to LTS release 18.11.2
Next message: [dpdk-stable] patch 'rwlock: reimplement with atomic builtins' has been queued to LTS release 18.11.2
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

FYI, your patch has been queued to LTS release 18.11.2

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 04/24/19. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Thanks.

Kevin Traynor

---
>From 548edf47ab0869243541b1c855d1e9330b4718a7 Mon Sep 17 00:00:00 2001
From: Gavin Hu <gavin.hu at arm.com>
Date: Fri, 8 Mar 2019 15:56:37 +0800
Subject: [PATCH] spinlock: reimplement with atomic one-way barrier

[ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ]

The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Gavin Hu <gavin.hu at arm.com>
Reviewed-by: Phil Yang <phil.yang at arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl at arm.com>
Reviewed-by: Steve Capper <steve.capper at arm.com>
Reviewed-by: Jerin Jacob <jerinj at marvell.com>
Acked-by: Nipun Gupta <nipun.gupta at nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
---
 .../common/include/generic/rte_spinlock.h      | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
index c4c3fc31e..87ae7a4f1 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -62,7 +62,12 @@ static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {
-	while (__sync_lock_test_and_set(&sl->locked, 1))
-		while(sl->locked)
+	int exp = 0;
+
+	while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
 			rte_pause();
+		exp = 0;
+	}
 }
 #endif
@@ -81,5 +86,5 @@ static inline void
 rte_spinlock_unlock (rte_spinlock_t *sl)
 {
-	__sync_lock_release(&sl->locked);
+	__atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
 }
 #endif
@@ -100,5 +105,8 @@ static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl)
 {
-	return __sync_lock_test_and_set(&sl->locked,1) == 0;
+	int exp = 0;
+	return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+				0, /* disallow spurious failure */
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
 }
 #endif
@@ -114,5 +122,5 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
 static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
 {
-	return sl->locked;
+	return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE);
 }
 
-- 
2.20.1

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2019-04-16 15:34:27.777024964 +0100
+++ 0058-spinlock-reimplement-with-atomic-one-way-barrier.patch	2019-04-16 15:34:25.236178707 +0100
@@ -1,8 +1,10 @@
-From 453d8f736676255696bce3c5e01b43b543ff896c Mon Sep 17 00:00:00 2001
+From 548edf47ab0869243541b1c855d1e9330b4718a7 Mon Sep 17 00:00:00 2001
 From: Gavin Hu <gavin.hu at arm.com>
 Date: Fri, 8 Mar 2019 15:56:37 +0800
 Subject: [PATCH] spinlock: reimplement with atomic one-way barrier
 
+[ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ]
+
 The __sync builtin based implementation generates full memory barriers
 ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
 barriers.
@@ -71,7 +73,6 @@
 than __sync.
 
 Fixes: af75078fece3 ("first public release")
-Cc: stable at dpdk.org
 
 Signed-off-by: Gavin Hu <gavin.hu at arm.com>
 Reviewed-by: Phil Yang <phil.yang at arm.com>

Previous message: [dpdk-stable] patch 'test/spinlock: amortize the cost of getting time' has been queued to LTS release 18.11.2
Next message: [dpdk-stable] patch 'rwlock: reimplement with atomic builtins' has been queued to LTS release 18.11.2
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the stable mailing list