[v3,5/5] config: add WFE config entry for aarch64

Message ID 1563896626-44862-6-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series use WFE for locks and ring on aarch64 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Gavin Hu July 23, 2019, 3:43 p.m. UTC
  Add the RTE_USE_WFE configuration entry for aarch64, disabled by default.
It can be enabled selectively based on the performance benchmarking.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 config/arm/meson.build     | 1 +
 config/common_armv8a_linux | 6 ++++++
 2 files changed, 7 insertions(+)
  

Comments

Stephen Hemminger July 23, 2019, 6:05 p.m. UTC | #1
On Tue, 23 Jul 2019 23:43:46 +0800
Gavin Hu <gavin.hu@arm.com> wrote:

> Add the RTE_USE_WFE configuration entry for aarch64, disabled by default.
> It can be enabled selectively based on the performance benchmarking.
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  config/arm/meson.build     | 1 +
>  config/common_armv8a_linux | 6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 979018e..496813a 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -116,6 +116,7 @@ impl_dpaa = ['NXP DPAA', flags_dpaa, machine_args_generic]
>  impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic]
>  
>  dpdk_conf.set('RTE_FORCE_INTRINSICS', 1)
> +dpdk_conf.set('RTE_USE_WFE', 0)
>  
>  if not dpdk_conf.get('RTE_ARCH_64')
>  	dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
> diff --git a/config/common_armv8a_linux b/config/common_armv8a_linux
> index 481712e..48c7ab5 100644
> --- a/config/common_armv8a_linux
> +++ b/config/common_armv8a_linux
> @@ -12,6 +12,12 @@ CONFIG_RTE_ARCH_64=y
>  
>  CONFIG_RTE_FORCE_INTRINSICS=y
>  
> +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
> +# calling these APIs put the cores enter low power state while waiting
> +# for the memory address to be become equal to the expected value.
> +# This is supported only by aarch64.
> +CONFIG_RTE_USE_WFE=n
> +
>  # Maximum available cache line size in arm64 implementations.
>  # Setting to maximum available cache line size in generic config
>  # to address minimum DMA alignment across all arm64 implementations.

Introducing config options is a maintenance nightmare.
How are distributions supposed to ship a package?
Does full regression test get done on both options?

The user should not be able to change this.
  
Honnappa Nagarahalli July 23, 2019, 7:10 p.m. UTC | #2
> 
> On Tue, 23 Jul 2019 23:43:46 +0800
> Gavin Hu <gavin.hu@arm.com> wrote:
> 
> > Add the RTE_USE_WFE configuration entry for aarch64, disabled by default.
> > It can be enabled selectively based on the performance benchmarking.
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >  config/arm/meson.build     | 1 +
> >  config/common_armv8a_linux | 6 ++++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/config/arm/meson.build b/config/arm/meson.build index
> > 979018e..496813a 100644
> > --- a/config/arm/meson.build
> > +++ b/config/arm/meson.build
> > @@ -116,6 +116,7 @@ impl_dpaa = ['NXP DPAA', flags_dpaa,
> > machine_args_generic]
> >  impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic]
> >
> >  dpdk_conf.set('RTE_FORCE_INTRINSICS', 1)
> > +dpdk_conf.set('RTE_USE_WFE', 0)
> >
> >  if not dpdk_conf.get('RTE_ARCH_64')
> >  	dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64) diff --git
> > a/config/common_armv8a_linux b/config/common_armv8a_linux index
> > 481712e..48c7ab5 100644
> > --- a/config/common_armv8a_linux
> > +++ b/config/common_armv8a_linux
> > @@ -12,6 +12,12 @@ CONFIG_RTE_ARCH_64=y
> >
> >  CONFIG_RTE_FORCE_INTRINSICS=y
> >
> > +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
> > +# calling these APIs put the cores enter low power state while
> > +waiting # for the memory address to be become equal to the expected value.
> > +# This is supported only by aarch64.
> > +CONFIG_RTE_USE_WFE=n
> > +
> >  # Maximum available cache line size in arm64 implementations.
> >  # Setting to maximum available cache line size in generic config  #
> > to address minimum DMA alignment across all arm64 implementations.
> 
> Introducing config options is a maintenance nightmare.
> How are distributions supposed to ship a package?
> Does full regression test get done on both options?
> 
> The user should not be able to change this.
Agree with these concerns here. In our tests, we are finding that this patch does not result in performance improvements on all micro-architectures. May be these micro-architectures will evolve in the future knowing that WFE is being used in DPDK. But at this point, it does not make sense to enable this by default. This means additional testing/regression with the flag enabled. We could add this to Travis build (Travis yml file).

Currently, this patch will address use cases where the target hardware/environment is known during compilation.
  
Jerin Jacob Kollanukkaran July 24, 2019, 12:25 p.m. UTC | #3
> -----Original Message-----
> From: Gavin Hu <gavin.hu@arm.com>
> Sent: Tuesday, July 23, 2019 9:14 PM
> To: dev@dpdk.org
> Cc: nd@arm.com; thomas@monjalon.net; stephen@networkplumber.org;
> Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Pavan Nikhilesh
> Bhagavatula <pbhagavatula@marvell.com>;
> Honnappa.Nagarahalli@arm.com; gavin.hu@arm.com
> Subject: [EXT] [PATCH v3 5/5] config: add WFE config entry for aarch64
> 
> Add the RTE_USE_WFE configuration entry for aarch64, disabled by default.
> It can be enabled selectively based on the performance benchmarking.
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs, #
> +calling these APIs put the cores enter low power state while waiting #
> +for the memory address to be become equal to the expected value.
> +# This is supported only by aarch64.
> +CONFIG_RTE_USE_WFE=n

# If it specific for arm and none of the other architectures supports it then
I would like to change the config as CONFIG_RTE_ARM_USE_WFE
# Even if it is disabled, have the =n entry in config/common_base to know
all supported configs in DPDK in one place.
# Arrange all CONFIG_RTE_ARM_* together in config/common_base
  
Gavin Hu July 24, 2019, 5:59 p.m. UTC | #4
Hi Stephen,
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, July 24, 2019 3:10 AM
> To: Stephen Hemminger <stephen@networkplumber.org>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net;
> jerinj@marvell.com; pbhagavatula@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 5/5] config: add WFE config entry for aarch64
> 
> >
> > On Tue, 23 Jul 2019 23:43:46 +0800
> > Gavin Hu <gavin.hu@arm.com> wrote:
> >
> > > Add the RTE_USE_WFE configuration entry for aarch64, disabled by
> default.
> > > It can be enabled selectively based on the performance benchmarking.
> > >
> > > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > > ---
> > >  config/arm/meson.build     | 1 +
> > >  config/common_armv8a_linux | 6 ++++++
> > >  2 files changed, 7 insertions(+)
> > >
> > > diff --git a/config/arm/meson.build b/config/arm/meson.build index
> > > 979018e..496813a 100644
> > > --- a/config/arm/meson.build
> > > +++ b/config/arm/meson.build
> > > @@ -116,6 +116,7 @@ impl_dpaa = ['NXP DPAA', flags_dpaa,
> > > machine_args_generic]
> > >  impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic]
> > >
> > >  dpdk_conf.set('RTE_FORCE_INTRINSICS', 1)
> > > +dpdk_conf.set('RTE_USE_WFE', 0)
> > >
> > >  if not dpdk_conf.get('RTE_ARCH_64')
> > >  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64) diff --git
> > > a/config/common_armv8a_linux b/config/common_armv8a_linux index
> > > 481712e..48c7ab5 100644
> > > --- a/config/common_armv8a_linux
> > > +++ b/config/common_armv8a_linux
> > > @@ -12,6 +12,12 @@ CONFIG_RTE_ARCH_64=y
> > >
> > >  CONFIG_RTE_FORCE_INTRINSICS=y
> > >
> > > +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
> > > +# calling these APIs put the cores enter low power state while
> > > +waiting # for the memory address to be become equal to the expected
> value.
> > > +# This is supported only by aarch64.
> > > +CONFIG_RTE_USE_WFE=n
> > > +
> > >  # Maximum available cache line size in arm64 implementations.
> > >  # Setting to maximum available cache line size in generic config  #
> > > to address minimum DMA alignment across all arm64 implementations.
> >
> > Introducing config options is a maintenance nightmare.
> > How are distributions supposed to ship a package?
> > Does full regression test get done on both options?
> >
> > The user should not be able to change this.
> Agree with these concerns here. In our tests, we are finding that this patch
> does not result in performance improvements on all micro-architectures.
> May be these micro-architectures will evolve in the future knowing that
> WFE is being used in DPDK. But at this point, it does not make sense to
> enable this by default. This means additional testing/regression with the flag
> enabled. We could add this to Travis build (Travis yml file).
> 
> Currently, this patch will address use cases where the target
> hardware/environment is known during compilation.
In our testing, like running testpmd and packet_ordering(for WFE ring benchmarking), it showed no improvements nor degradation in performance. 
For some micro-benchmarking, it showed slight improvements sometimes, no degradation were seen.
The added benefit of the patch set is power saving, but it is not a primary concern in DPDK and we are short of measurement ways to benchmark that.
  

Patch

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 979018e..496813a 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,7 @@  impl_dpaa = ['NXP DPAA', flags_dpaa, machine_args_generic]
 impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic]
 
 dpdk_conf.set('RTE_FORCE_INTRINSICS', 1)
+dpdk_conf.set('RTE_USE_WFE', 0)
 
 if not dpdk_conf.get('RTE_ARCH_64')
 	dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
diff --git a/config/common_armv8a_linux b/config/common_armv8a_linux
index 481712e..48c7ab5 100644
--- a/config/common_armv8a_linux
+++ b/config/common_armv8a_linux
@@ -12,6 +12,12 @@  CONFIG_RTE_ARCH_64=y
 
 CONFIG_RTE_FORCE_INTRINSICS=y
 
+# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
+# calling these APIs put the cores enter low power state while waiting
+# for the memory address to be become equal to the expected value.
+# This is supported only by aarch64.
+CONFIG_RTE_USE_WFE=n
+
 # Maximum available cache line size in arm64 implementations.
 # Setting to maximum available cache line size in generic config
 # to address minimum DMA alignment across all arm64 implementations.