[dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method

Neil Horman nhorman at tuxdriver.com
Wed Dec 17 21:27:43 CET 2014


On Wed, Dec 17, 2014 at 07:22:06PM +0000, Ananyev, Konstantin wrote:
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Wednesday, December 17, 2014 3:33 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > 
> > On Tue, Dec 16, 2014 at 04:16:48PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Monday, December 15, 2014 8:21 PM
> > > > To: Ananyev, Konstantin
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > > >
> > > > On Mon, Dec 15, 2014 at 04:33:47PM +0000, Ananyev, Konstantin wrote:
> > > > > Hi Neil,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > Sent: Monday, December 15, 2014 4:00 PM
> > > > > > To: Ananyev, Konstantin
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > > > > >
> > > > > > On Sun, Dec 14, 2014 at 06:10:52PM +0000, Konstantin Ananyev wrote:
> > > > > > > Introduce new classify() method that uses AVX2 instructions.
> > > > > > > From my measurements:
> > > > > > > On HSW boards when processing >= 16 packets per call,
> > > > > > > AVX2 method outperforms it's SSE counterpart by 10-25%,
> > > > > > > (depending on the ruleset).
> > > > > > > At runtime, this method is selected as default one on HW that supports AVX2.
> > > > > > >
> > > > > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
> > > > > > > ---
> > > > > > >  lib/librte_acl/Makefile       |   9 +
> > > > > > >  lib/librte_acl/acl.h          |   4 +
> > > > > > >  lib/librte_acl/acl_run.h      |   2 +-
> > > > > > >  lib/librte_acl/acl_run_avx2.c |  58 +++++
> > > > > > >  lib/librte_acl/acl_run_avx2.h | 305 ++++++++++++++++++++++++
> > > > > > >  lib/librte_acl/acl_run_sse.c  | 537 +-----------------------------------------
> > > > > > >  lib/librte_acl/acl_run_sse.h  | 533 +++++++++++++++++++++++++++++++++++++++++
> > > > > > >  lib/librte_acl/rte_acl.c      |   5 +-
> > > > > > >  lib/librte_acl/rte_acl.h      |   2 +
> > > > > > >  9 files changed, 917 insertions(+), 538 deletions(-)
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_avx2.c
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_avx2.h
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_sse.h
> > > > > > >
> > > > > > > diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
> > > > > > > index 65e566d..223ec31 100644
> > > > > > > --- a/lib/librte_acl/Makefile
> > > > > > > +++ b/lib/librte_acl/Makefile
> > > > > > > @@ -45,8 +45,17 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c
> > > > > > > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_avx2.c
> > > > > > >
> > > > > > >  CFLAGS_acl_run_sse.o += -msse4.1
> > > > > > > +ifeq ($(CC), icc)
> > > > > > > +CFLAGS_acl_run_avx2.o += -march=core-avx2
> > > > > > > +else ifneq ($(shell \
> > > > > > > +test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 6 && echo 1), 1)
> > > > > > > +CFLAGS_acl_run_avx2.o += -mavx2
> > > > > > > +else
> > > > > > > +CFLAGS_acl_run_avx2.o += -msse4.1
> > > > > > > +endif
> > > > > > >
> > > > > > This seems broken.  You've unilaterally included acl_run_avx2.c in the build
> > > > > > list above, but only enable -mavx2 if the compiler is at least gcc 4.6.
> > > > >
> > > > > Actually 4.7 (before that version, as I know,  gcc doesn't support avx2)
> > > > >
> > > > > >  Unless
> > > > > > you want to make gcc 4.6 a requirement for building,
> > > > >
> > > > > I believe DPDK is required to be buildable by gcc 4.6
> > > > > As I remember, we have to support it all way down to gcc 4.3.
> > > > >
> > > > > > you need to also exclude
> > > > > > the file above from the build list.
> > > > >
> > > > > That means that for  gcc 4.6 and below rte_acl_classify_avx2() would not be defined.
> > > > > And then at runtime, I have to check for that somehow and (re)populate classify_fns[].
> > > > > Doesn't seems like a good way to me.
> > > > There are plenty of ways around that.
> > > >
> > > > At a minimum you could make the classify_fns array the one place that you need
> > > > to add an ifdef __AVX__ call.
> > > >
> > > > You could also create a secondary definition of rte_acl_classify_avx2, and mark
> > > > it as a weak symbol, which only returns -EOPNOTSUPP.  That would be good, since
> > > > the right thing will just automatically happen then if you don't build the
> > > > actual avx2 classification code
> > > >
> > > > > Instead, I prefer to always build acl_run_avx2.c,
> > >
> > >
> > > > But you can't do that.  You just said above that you need to support down to gcc
> > > > 4.3.  I see you've worked around that with some additional ifdef __AVX__
> > > > instructions, but in so doing you ignore the possibiity that sse isn't
> > > > supported, so you need to add __SSE__ checks now as well.  ifdeffing that much
> > > > just isn't scalable.
> > >
> > > We don't need to worry about compiler without SSE4.1 support.
> > > I believe that all compilers that DDPDK has to build with, do support SSE4.1.
> > > So for SSE4.1 we only has to worry about situation when target CPU doesn't support it
> > > We manage it by runtime selection.
> > > For AVX2 - situation is a bit different: it could be both compiler and target CPU that don't support it.
> > >
> > > >  And for your effort, you get an AVX2 classification path
> > > > that potentially doesn't actually do vectorized classification.
> > > >
> > > > It really seems better to me to not build the code if the compiler doesn't
> > > > support the instruction set it was meant to enable, and change the
> > > > classification function pointer to something that informs the user of the lack
> > > > of support at run time.
> > > >
> > > > > but for old compilers that don't support AVX2 -
> > > > > rte_acl_classify_avx2() would simply be identical to rte_acl_classify_sse().
> > > > >
> > > > That doesn't make sense to me, for two reasons:
> > > >
> > > > 1) What if the machine being targeted doesn't support sse either?
> > > >
> > >
> > > Exactly the same what is happening now on the machine with now SSE4.1 support.
> > > There is absolutely no difference here.
> > >
> > > > 2) If an application selects an AVX2 classifier, I as a developer expect to
> > > > either get AVX2 based classification, or an error indicating that I can't do
> > > > AVX2 classification, not a silent performance degradation down to scalar
> > > > classification.
> > >
> > > In fact I was considering both variants for compilers not supporting AVX2:
> > > 1. silently degrade to SSE method.
> > > 2. create  a dummy function rte_acl_classify_error() and put it  into classify_fns[RTE_ACL_CLASSIFY_AVX2].
> > >
> > > I choose #1 because it seems like a less distraction for the user -
> > > all would keep working as before, user just wouldn't see any improvement comparing to SSE method.
> > > Again didn't want to spread "ifdef __AVX2__" into rte_acl.c
> > > Though I don't have any strong opinion here.
> > > So if you can provide some good reason why #2 is preferable, I am ok to switch to #2.
> > >
> > Because 2 doesn't require any ifdeffing.  As you note above the problem here is
> > that AVX2 support is both compiler and machine dependent.  If you make a weak
> > symbol version of rte_acl_classify_avx2 that always gets built, then you've
> > reduced the problem to just being compiler support, which you can check in the
> > makefile.
> 
> I don't think we'll get rid of ifdefing with #2.
> We'll  remove 2 ifdefs in acl_run_avx2.h, but then we have to introduce 2 new in rte_acl.c instead.
> From my understanding, we we'll need something like that:
> 
> static const rte_acl_classify_t classify_fns[] = {
>         [RTE_ACL_CLASSIFY_DEFAULT] = rte_acl_classify_scalar,
>         [RTE_ACL_CLASSIFY_SCALAR] = rte_acl_classify_scalar,
>         [RTE_ACL_CLASSIFY_SSE] = rte_acl_classify_sse,
> +#if (defined __GNUC__ &&  __GNUC__ <= 4 && __GNUC_MINOR__ < 7)
> +      [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_error,
> +#else  
>       [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_avx2,
> +#endif


You don't need to do this, you need to use a weak symbol:
static int rte_acl_classify_avx2(...) __attributes__(weak)
{
	return -EOPNOTSUP
}


Then in the rte_acl_avx2.c file define it again without the weak symbol

That way, you do conditional compilation, and when you do the "real" symbol
overrides the weak one.

> };
> 
> static void __attribute__((constructor))
> rte_acl_init(void)
> {
>         enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
> 
> +#if (defined __GNUC__ &&  __GNUC__ <= 4 && __GNUC_MINOR__ < 7)
>         if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
>                 alg = RTE_ACL_CLASSIFY_AVX2;
>         else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> +#else
> +      if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
>                 alg = RTE_ACL_CLASSIFY_SSE;
> +#endif
>         rte_acl_set_default_classify(alg);
> }
> 
Why would you do this, this cpu feature flag definitions aren't matched to
compiler support, it should always be defined.  You should still be able to
check for AVX2 support in code that doesn't support emitting the instruction.

Neil

> Correct?
> Konstantin
> 
> > 
> > > >
> > > > > >  That in turn I think allows you to remove a
> > > > > > bunch of the ifdeffing that you've done in some of the avx2 specific files.
> > > > >
> > > > > Actually there are not many of them.
> > > > > One in acl_run_avx2.h and another in acl_run_avx2.c.
> > > > >
> > > > 2 in acl_run_avx2.h and 1 in rte_acl_osdep_alone.h, which is really 3 more than
> > > > you need if you just do an intellegent weak classifier function defintion.
> > >
> > > grep -n __AVX2__ lib/librte_acl/*.[c,h] | grep -v endif
> > > lib/librte_acl/acl_run_avx2.c:45:#ifdef __AVX2__
> > > lib/librte_acl/acl_run_avx2.h:36:#ifdef __AVX2__
> > >
> > > rte_acl_osdep_alone.h - is a different story.
> > > It needs to be there anyway, as in rte_common_vect.h.
> > > In fact  rte_acl_osdep_alone.h is only needed for cases when RTE_LIBRTE_ACL_STANDALONE=y.
> > > That comes from the old days, when we had to to support building librte_acl library without the rest of DPDK.
> > > I think we don't need it anymore and plan to remove it.
> > > Just thought it should  be in a separate patch.
> > > Konstantin
> > >
> > > >
> > > > Neil
> > >
> 


More information about the dev mailing list