[v3,4/5] spinlock: use wfe to reduce contention on aarch64

Message ID 1563896626-44862-5-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series use WFE for locks and ring on aarch64 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Gavin Hu July 23, 2019, 3:43 p.m. UTC
  In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

5~10% performance gain was measured by running spinlock_autotest on
14 isolated cores of ThunderX2.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 .../common/include/arch/arm/rte_spinlock.h         | 25 ++++++++++++++++++++++
 .../common/include/generic/rte_spinlock.h          |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)
  

Comments

Jerin Jacob Kollanukkaran July 24, 2019, 12:17 p.m. UTC | #1
> -----Original Message-----
> From: Gavin Hu <gavin.hu@arm.com>
> Sent: Tuesday, July 23, 2019 9:14 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; thomas@monjalon.net; stephen@networkplumber.org;
> Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Pavan Nikhilesh
> Bhagavatula <pbhagavatula@marvell.com>;
> Honnappa.Nagarahalli@arm.com; gavin.hu@arm.com
> Subject: [EXT] [PATCH v3 4/5] spinlock: use wfe to reduce contention on
> aarch64
> 
> In acquiring a spinlock, cores repeatedly poll the lock variable.
> This is replaced by rte_wait_until_equal API.
> 
> 5~10% performance gain was measured by running spinlock_autotest on
> 14 isolated cores of ThunderX2.
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  .../common/include/arch/arm/rte_spinlock.h         | 25
> ++++++++++++++++++++++
>  .../common/include/generic/rte_spinlock.h          |  2 +-
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> index 1a6916b..f25d17f 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> @@ -16,6 +16,31 @@ extern "C" {
>  #include <rte_common.h>
>  #include "generic/rte_spinlock.h"
> 
> +/* armv7a does support WFE, but an explicit wake-up signal using SEV is
> + * required (must be preceded by DSB to drain the store buffer) and
> + * this is less performant, so keep armv7a implementation unchanged.
> + */
> +#if defined(RTE_USE_WFE) && defined(RTE_ARCH_ARM64) static inline

See below. Please avoid complicated conditional compilation logic for scalability and readability.
 

> void
> +rte_spinlock_lock(rte_spinlock_t *sl) {
> +	unsigned int tmp;
> +	/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
> +	 * faqs/ka16809.html
> +	 */
> +	asm volatile(
> +		"sevl\n"
> +		"1:	wfe\n"
> +		"2:	ldaxr %w[tmp], %w[locked]\n"
> +		"cbnz   %w[tmp], 1b\n"
> +		"stxr   %w[tmp], %w[one], %w[locked]\n"
> +		"cbnz   %w[tmp], 2b\n"
> +		: [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
> +		: [one] "r" (1)
> +		: "cc", "memory");
> +}
> +#endif
> +
>  static inline int rte_tm_supported(void)  {
>  	return 0;
> diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h
> b/lib/librte_eal/common/include/generic/rte_spinlock.h
> index 87ae7a4..cf4f15b 100644
> --- a/lib/librte_eal/common/include/generic/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
> @@ -57,7 +57,7 @@ rte_spinlock_init(rte_spinlock_t *sl)  static inline void
> rte_spinlock_lock(rte_spinlock_t *sl);
> 
> -#ifdef RTE_FORCE_INTRINSICS
> +#if defined(RTE_FORCE_INTRINSICS) && !defined(RTE_USE_WFE)

I would like to avoid hacking around adjusting generic code to meet specific requirement.
For example, if someone enables RTE_USE_WFE for armv7 it will break
And it will pain for the new architecture to use RTE_FORCE_INTRINSICS.

Looks like the time has come to disable RTE_FORCE_INTRINSICS for arm64. 

Since this patch is targeted for next release. How about enable native
Implementation for RTE_FORCE_INTRINSICS used code for arm64 like spinlock, ticketlock like x86.
If you guys don't have the bandwidth to convert all blocks, let us know, we can collaborate
and Marvell can take up some RTE_FORCE_INTRINSICS conversion for next release.


>  static inline void
>  rte_spinlock_lock(rte_spinlock_t *sl)
>  {
> --
> 2.7.4
  

Patch

diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
index 1a6916b..f25d17f 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
@@ -16,6 +16,31 @@  extern "C" {
 #include <rte_common.h>
 #include "generic/rte_spinlock.h"
 
+/* armv7a does support WFE, but an explicit wake-up signal using SEV is
+ * required (must be preceded by DSB to drain the store buffer) and
+ * this is less performant, so keep armv7a implementation unchanged.
+ */
+#if defined(RTE_USE_WFE) && defined(RTE_ARCH_ARM64)
+static inline void
+rte_spinlock_lock(rte_spinlock_t *sl)
+{
+	unsigned int tmp;
+	/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
+	 * faqs/ka16809.html
+	 */
+	asm volatile(
+		"sevl\n"
+		"1:	wfe\n"
+		"2:	ldaxr %w[tmp], %w[locked]\n"
+		"cbnz   %w[tmp], 1b\n"
+		"stxr   %w[tmp], %w[one], %w[locked]\n"
+		"cbnz   %w[tmp], 2b\n"
+		: [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
+		: [one] "r" (1)
+		: "cc", "memory");
+}
+#endif
+
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
index 87ae7a4..cf4f15b 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -57,7 +57,7 @@  rte_spinlock_init(rte_spinlock_t *sl)
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl);
 
-#ifdef RTE_FORCE_INTRINSICS
+#if defined(RTE_FORCE_INTRINSICS) && !defined(RTE_USE_WFE)
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {