[dpdk-dev] [PATCHv3] librte_acl make it build/work for 'default' target

Neil Horman nhorman at tuxdriver.com
Thu Aug 28 17:55:07 CEST 2014


On Wed, Aug 27, 2014 at 07:18:44PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Wednesday, August 27, 2014 7:57 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> > Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' target
> > 
> > On Wed, Aug 27, 2014 at 11:25:04AM +0000, Ananyev, Konstantin wrote:
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Tuesday, August 26, 2014 6:45 PM
> > > > To: Ananyev, Konstantin
> > > > Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> > > > Subject: Re: [PATCHv3] librte_acl make it build/work for 'default' target
> > > >
> > > > On Mon, Aug 25, 2014 at 04:30:05PM +0000, Ananyev, Konstantin wrote:
> > > > > Hi Neil,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > Sent: Thursday, August 21, 2014 9:15 PM
> > > > > > To: dev at dpdk.org
> > > > > > Cc: Ananyev, Konstantin; thomas.monjalon at 6wind.com; Neil Horman
> > > > > > Subject: [PATCHv3] librte_acl make it build/work for 'default' target
> > > > > >
> > > > > > Make ACL library to build/work on 'default' architecture:
> > > > > > - make rte_acl_classify_scalar really scalar
> > > > > >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > > > > > - Provide two versions of rte_acl_classify code path:
> > > > > >   rte_acl_classify_sse() - could be build and used only on systems with sse4.2
> > > > > >   and upper, return -ENOTSUP on lower arch.
> > > > > >   rte_acl_classify_scalar() - a slower version, but could be build and used
> > > > > >   on all systems.
> > > > > > - keep common code shared between these two codepaths.
> > > > > >
> > > > > > v2 chages:
> > > > > >  run-time selection of most appropriate code-path for given ISA.
> > > > > >  By default the highest supprted one is selected.
> > > > > >  User can still override that selection by manually assigning new value to
> > > > > >  the global function pointer rte_acl_default_classify.
> > > > > >  rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
> > > > > >  points to.
> > > > > >
> > > > >
> > > > > I see you decided not to wait for me and fix everything by yourself :)
> > > > >
> > > > Yeah, sorry, I'm getting pinged about enabling these features in Fedora, and it
> > > > had been about 2 weeks, so I figured I'd just take care of it.
> > >
> > > No worries. I admit that it was a long delay from my side.
> > >
> > > >
> > > > > > V3 Changes
> > > > > >  Updated classify pointer to be a function so as to better preserve ABI
> > > > >
> > > > > As I said in my previous mail it generates extra jump...
> > > > > Though from numbers I got the performance impact is negligible: < 1%.
> > > > > So I suppose, I don't have a good enough reason to object :)
> > > > >
> > > > Yeah, I just don't see a way around it.  I was hoping that the compiler would
> > > > have been smart enough to see that the rte_acl_classify function was small and
> > > > in-linable, but apparently it won't do that.  As you note however the
> > > > performance change is minor (I'm guessing within a standard deviation of your
> > > > results).
> > > >
> > > > > Though I still think we better keep  rte_acl_classify_scalar() publically available (same as we do for rte acl_classify_sse()):
> > > > > First of all keep  rte_acl_classify_scalar() is already part of our public API.
> > > > > Also, as I remember, one of the customers explicitly asked for scalar version and they planned to call it directly.
> > > > > Plus using rte_acl_select_classify() to always switch between implementations is not always handy:
> > > >
> > > > I'm not exactly opposed to this, though it seems odd to me that a user might
> > > > want to call a particular version of the classifier directly.  But I certainly
> > > > can't predict everything a consumer wants to do.  If we really need to keep it
> > > > public then, it begs the question, is providing a generic entry point even
> > > > worthwhile?  Is it just as easy to expose the scalar/sse and any future versions
> > > > directly so the application can just embody the intellegence to select the best
> > > > path?  That saves us having to maintain another API point.  I can go with
> > > > consensus on that.
> > > >
> > > > > -  it is global, which means that we can't simultaneously use classify_scalar() and classify_sse() for 2 different ACL contexts.
> > > > > - to properly support such switching we then will need to support something like (see app/test/test_acl.c below):
> > > > >   old_alg = rte_acl_get_classify();
> > > > >   rte_acl_select_classify(new_alg);
> > > > >   ...
> > > > >   rte_acl_select_classify(old_alg);
> > > > >
> > > > We could attach the classification method to the acl context, so each
> > > > rte_acl_ctx can point to whatever classifier funtion it wants to.  That would
> > > > remove the global issues you point out above.
> > >
> > > I thought about that approach too.
> > > But there is one implication with DPDK MP model:
> > > Same ACL context can be shared by different DPDK processes,
> > > while acl_classify() could be loaded to the different addresses.
> > > Of course we can overcome it by creating a global table of function pointers indexed by calssify_alg and
> > > store inside ACL ctx alg instead of actual function pointer.
> > > But that means extra overhead of at least two loads per classify() call.
> > >
> > Hmm, how is the context shared around between processes?  Is it just shared as a
> > common cow data page resulting from a fork?  If so, then we should be good
> > because the DSO text will be at the same address (i.e. the pointer will still be
> > valid).  If you do some sort of message passing, then, yes, thats a problem.
> > 
> 
> No, it is not parent-child relationship.
> There could be a group of  independently spawned processes.
> One of them should be 'primary' (starts first), other 'secondary's'.
> All hugepage memory pages mapped by the primary process, supposed to be mapped to the same VAs by each secondary.    
> So all stuff that is allocated from hugepage memory is shared between all processes in the group.
> More  detailed  description: http://dpdk.org/doc/intel/dpdk-prog-guide-1.7.0.pdf, section 23.
> 
Ugh, so because you explicitly share heap memory space accross all processes, we
can never guarantee any pointers to statically allocated symbols, like functions
or global data.  Great.  Ok, I'll try rework this.
Neil



More information about the dev mailing list