[dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Aug 27 13:25:04 CEST 2014


> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, August 26, 2014 6:45 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' target
> 
> On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote:
> > Hi Neil,
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Thursday, August 21, 2014 9:15 PM
> > > To: dev at dpdk.org
> > > Cc: Ananyev, Konstantin; thomas.monjalon at 6wind.com; Neil Horman
> > > Subject: [PATCHv3] librte_acl make it build/work for 'default' target
> > >
> > > Make ACL library to build/work on 'default' architecture:
> > > - make rte_acl_classify_scalar really scalar
> > >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > > - Provide two versions of rte_acl_classify code path:
> > >   rte_acl_classify_sse() - could be build and used only on systems with sse4.2
> > >   and upper, return -ENOTSUP on lower arch.
> > >   rte_acl_classify_scalar() - a slower version, but could be build and used
> > >   on all systems.
> > > - keep common code shared between these two codepaths.
> > >
> > > v2 chages:
> > >  run-time selection of most appropriate code-path for given ISA.
> > >  By default the highest supprted one is selected.
> > >  User can still override that selection by manually assigning new value to
> > >  the global function pointer rte_acl_default_classify.
> > >  rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
> > >  points to.
> > >
> >
> > I see you decided not to wait for me and fix everything by yourself :)
> >
> Yeah, sorry, I'm getting pinged about enabling these features in Fedora, and it
> had been about 2 weeks, so I figured I'd just take care of it.

No worries. I admit that it was a long delay from my side.

> 
> > > V3 Changes
> > >  Updated classify pointer to be a function so as to better preserve ABI
> >
> > As I said in my previous mail it generates extra jump...
> > Though from numbers I got the performance impact is negligible: < 1%.
> > So I suppose, I don't have a good enough reason to object :)
> >
> Yeah, I just don't see a way around it.  I was hoping that the compiler would
> have been smart enough to see that the rte_acl_classify function was small and
> in-linable, but apparently it won't do that.  As you note however the
> performance change is minor (I'm guessing within a standard deviation of your
> results).
> 
> > Though I still think we better keep  rte_acl_classify_scalar() publically available (same as we do for rte acl_classify_sse()):
> > First of all keep  rte_acl_classify_scalar() is already part of our public API.
> > Also, as I remember, one of the customers explicitly asked for scalar version and they planned to call it directly.
> > Plus using rte_acl_select_classify() to always switch between implementations is not always handy:
> 
> I'm not exactly opposed to this, though it seems odd to me that a user might
> want to call a particular version of the classifier directly.  But I certainly
> can't predict everything a consumer wants to do.  If we really need to keep it
> public then, it begs the question, is providing a generic entry point even
> worthwhile?  Is it just as easy to expose the scalar/sse and any future versions
> directly so the application can just embody the intellegence to select the best
> path?  That saves us having to maintain another API point.  I can go with
> consensus on that.
> 
> > -  it is global, which means that we can't simultaneously use classify_scalar() and classify_sse() for 2 different ACL contexts.
> > - to properly support such switching we then will need to support something like (see app/test/test_acl.c below):
> >   old_alg = rte_acl_get_classify();
> >   rte_acl_select_classify(new_alg);
> >   ...
> >   rte_acl_select_classify(old_alg);
> >
> We could attach the classification method to the acl context, so each
> rte_acl_ctx can point to whatever classifier funtion it wants to.  That would
> remove the global issues you point out above.

I thought about that approach too.
But there is one implication with DPDK MP model: 
Same ACL context can be shared by different DPDK processes,
while acl_classify() could be loaded to the different addresses.
Of course we can overcome it by creating a global table of function pointers indexed by calssify_alg and
store inside ACL ctx alg instead of actual function pointer.
But that means extra overhead of at least two loads per classify() call.

>  Or alternatively we can just not
> provide a generic entry point and let each user select a specific function.

I wonder can we have sort of mixed approach:
1. provide a generic entry point that would be set to the best (from our knowledge) available classify function.
2. Let each user use a specific function if he wants too.

i.e: 
- keep classify_scalar/classify_sse/classify_... public.
- keep your current implementation of rte_acl_classify()
BTW in that way, we probably can make acl_select_classify() static. 

So most users would just use generic entry point and wouldn't need to write their own code wrappers around it.
For users who need to use a particular classify()  version - they can call it directly.

> 
> 
> > >  REmoved macro definitions for match check functions to make them static inline
> >
> > More comments inlined below.
> >snip>
> > >
> > >  	/* make a quick check for scalar */
> > > -	ret = rte_acl_classify_scalar(acx, data, results,
> > > +	rte_acl_select_classify(ACL_CLASSIFY_SCALAR);
> > > +	ret = rte_acl_classify(acx, data, results,
> > >  			RTE_DIM(acl_test_data), RTE_ACL_MAX_CATEGORIES);
> >
> >
> > As I said above, that doesn't seem correct: we set rte_acl_default_classify = rte_acl_classify_scalar and never restore it back to the
> original value.
> > To support it properly, we need to:
> > old_alg = rte_acl_get_classify();
> >  rte_acl_select_classify(new_alg);
> >  ...
> >  rte_acl_select_classify(old_alg);
> >
> So, for the purposes of this test application, I don't see that as being needed.
> Every call to rte_acl_classify is preceded by a setting of the classifier
> function, so you're safe.

Not every, that's a problem.
As I can see, in test/test_acl.c you replaced
rte_acl_classify_scalar();
with
rte_acl_select_classify(SCALAR);
rte_acl_classify();

And never restore previous value of rte_acl_default_classify.
Right now rte_acl_default_classify is global, so after first:
rte_acl_select_classify(SCALAR);
all subsequent rte_acl_classify() will actually use scalar version.

>  If you're concerned about other processes using the
> dpdk library at the same time, you're still safe, as despite being a global
> variable, data pages in a DSO are Copy on Write, so each process gets their own
> copy of the global variable.

No, my concern here is only about  app/test here. 

> 
> Multiple threads within the same process are problematic, I agree, and thats
> solvable with the per-acl-context mechanism that I described above, though that
> shouldn't be needed here as this seems to be a single threaded program.
> 
> > Make all this just to keep UT valid seems like a big hassle to me.
> > So I said above - probably better just leave it to call rte_acl_classify_scalar() directly.
> >
> That works for me too, though the per-context mechanism seems kind of nice to
> me.  Let me know what you prefer.
> 
> ><snip>
> > >
> > > diff --git a/lib/librte_acl/acl_match_check.h b/lib/librte_acl/acl_match_check.h
> > > new file mode 100644
> > > index 0000000..4dc1982
> > > --- /dev/null
> > > +++ b/lib/librte_acl/acl_match_check.h
> >
> > As a nit: we probably don't need a special header just for one function and can place it inside acl_run.h.
> >
> Agreed, I can move that to acl_run.h.
> 
> ><snip>
> > > + */
> > > +static inline uint64_t
> > > +acl_match_check(uint64_t transition, int slot,
> > > +	const struct rte_acl_ctx *ctx, struct parms *parms,
> > > +	struct acl_flow_data *flows, void (*resolve_priority)(
> > > +	uint64_t transition, int n, const struct rte_acl_ctx *ctx,
> > > +	struct parms *parms, const struct rte_acl_match_results *p,
> > > +	uint32_t categories))
> >
> > Ugh, that's really hard to read.
> > Can we create a typedef for resolve_priority function type:
> > typedef void (*resolve_priority_t)(uint64_t, int,
> >         const struct rte_acl_ctx *ctx, struct parms *,
> >         const struct rte_acl_match_results *, uint32_t);
> > And use it here?
> >
> Sure, I'm fine with doing that.
> 
> ><snip>
> > > +
> > > +/* by default, use always avaialbe scalar code path. */
> > > +rte_acl_classify_t rte_acl_default_classify = rte_acl_classify_scalar;
> >
> > Why not 'static'?
> > I thought you'd like to hide it  from external world.
> >
> Doh!  I didn't do the one thing that I really meant to do.  I removed it from
> the header file but I forgot to declare the variable static.  I'll fix that.
> 
> > > +
> > > +void rte_acl_select_classify(enum acl_classify_alg alg)
> > > +{
> > > +
> > > +	switch(alg)
> > > +	{
> > > +		case ACL_CLASSIFY_DEFAULT:
> > > +		case ACL_CLASSIFY_SCALAR:
> > > +			rte_acl_default_classify = rte_acl_classify_scalar;
> > > +			break;
> > > +		case ACL_CLASSIFY_SSE:
> > > +			rte_acl_default_classify = rte_acl_classify_sse;
> > > +			break;
> > > +	}
> > > +
> > > +}
> >
> > As this is init phase function, I suppose we can add check that alg has a valid(supported) value, and return some error as return
> value, if not.
> >
> Not sure I follow what you're saying above, are you suggesting that we add a
> rte_cpu_get_flag_enabled check to rte_acl_select_classify above?
> 
> ><snip>
> > >   *
> > > @@ -286,9 +289,10 @@ rte_acl_reset(struct rte_acl_ctx *ctx);
> > >   * @return
> > >   *   zero on successful completion.
> > >   *   -EINVAL for incorrect arguments.
> > > + *   -ENOTSUP for unsupported platforms.
> >
> > Please remove the line above: current implementation doesn't return ENOTSUP
> > (I think that was left from v1).
> >
> Yup, probably was.  I'll remove it.
> 
> > >   */
> > >  int
> > > -rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > +rte_acl_classify_sse(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > >  	uint32_t *results, uint32_t num, uint32_t categories);
> > >
> > >  /**
> > > @@ -323,9 +327,23 @@ rte_acl_classify(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > >   *   zero on successful completion.
> > >   *   -EINVAL for incorrect arguments.
> > >   */
> > > -int
> > > -rte_acl_classify_scalar(const struct rte_acl_ctx *ctx, const uint8_t **data,
> > > -	uint32_t *results, uint32_t num, uint32_t categories);
> >
> >
> > As I said above we'd better keep it.
> >
> Ok, can do.
> 
> > > +
> > > +enum acl_classify_alg {
> > > +	ACL_CLASSIFY_DEFAULT = 0,
> > > +	ACL_CLASSIFY_SCALAR = 1,
> > > +	ACL_CLASSIFY_SSE = 2,
> > > +};
> >
> > As a nit: as this emum is part of public API, I think it is better to add rte_ prefix: enum rte_acl_classify_alg
> >
> Sure, done.
> 
> > > +
> > > +extern inline int rte_acl_classify(const struct rte_acl_ctx *ctx,
> > > +				   const uint8_t **data,
> > > +				   uint32_t *results, uint32_t num,
> > > +				   uint32_t categories);
> >
> > Again as a nit: here and everywhere can we keep same style through the whole DPDK - function name from the new line:
> > extern nt
> > rte_acl_classify(...);
> >
> Ok
> 
> I'll produce another version based on your feedback regarding the
> per-context-calssifier method vs. just removing the generic classifier.
> 
> Regards
> Neil



More information about the dev mailing list