[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

Walukiewicz, Miroslaw Miroslaw.Walukiewicz at intel.com
Thu Jan 22 13:45:21 CET 2015



> -----Original Message-----
> From: Liang, Cunming
> Sent: Thursday, January 22, 2015 1:20 PM
> To: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> thread
> 
> 
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, January 22, 2015 5:53 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-
> EAL
> > thread
> >
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > Sent: Thursday, January 22, 2015 9:17 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > > thread
> > >
> > > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > > It allows using rte_mempool in either EAL thread or any user pthread.
> > > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> > > It doesn't suggest to run multi-pthread/cpu which compete the
> > > rte_mempool.
> > > It will get bad performance and has critical risk if scheduling policy is RT.
> > >
> > > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > > ---
> > >  lib/librte_mempool/rte_mempool.h | 18 +++++++++++-------
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/lib/librte_mempool/rte_mempool.h
> > > b/lib/librte_mempool/rte_mempool.h
> > > index 3314651..4845f27 100644
> > > --- a/lib/librte_mempool/rte_mempool.h
> > > +++ b/lib/librte_mempool/rte_mempool.h
> > > @@ -198,10 +198,12 @@ struct rte_mempool {
> > >   *   Number to add to the object-oriented statistics.
> > >   */
> > >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {
> 	\
> > > -		unsigned __lcore_id = rte_lcore_id();		\
> > > -		mp->stats[__lcore_id].name##_objs += n;		\
> > > -		mp->stats[__lcore_id].name##_bulk += 1;		\
> > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {                    \
> > > +		unsigned __lcore_id = rte_lcore_id();           \
> > > +		if (__lcore_id < RTE_MAX_LCORE) {               \
> > > +			mp->stats[__lcore_id].name##_objs += n;	\
> > > +			mp->stats[__lcore_id].name##_bulk += 1;	\
> > > +		}                                               \
> > >  	} while(0)
> > >  #else
> > >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > > void * const *obj_table,
> > >  	__MEMPOOL_STAT_ADD(mp, put, n);
> > >
> > >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > > -	/* cache is not enabled or single producer */
> > > -	if (unlikely(cache_size == 0 || is_mp == 0))
> > > +	/* cache is not enabled or single producer or none EAL thread */
> >
> > I don't understand this limitation.
> >
> > I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> >         /** Per-lcore local cache. */
> >         struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> > #endif
> >
> > But why we cannot extent the size of the local cache table to something
> like
> > RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> >
> > Keeping this condition here is a  real performance killer!!!!!!.
> > I saw in my test application spending more 95% of CPU time reading the
> atomic
> > in M C/MP ring utilizing access to mempool.
> [Liang, Cunming] This is the first step to make it work.
> By Konstantin's comments, shall prevent to allocate unique id by ourselves.
> And the return value from gettid() is too large as an index.
> For non-EAL thread performance gap, will think about additional fix patch
> here.
> If care about performance, still prefer to choose EAL thread now.

In previous patch you had allocation of the thread id on base of unique gettid() as number 
not a potential pointer as we can expect from implementation getid() from Linux or BSD.

The another problem is that we compare here int with some unique thread identifier.
How can you prevent that when implementation of gettid will change and unique thread identifier will be 
Less than RTE_MAX_LCORE and will be still unique. 

I think that your assumption will work for well-known operating systems but will be very unportable.

Regarding performance the DPDK can work efficiently in different environments including pthreads. 
You can imagine running DPDK from pthread application where affinity will be made by application. 
Effectiveness depends on application thread implementation comparable to EAL threads. 

I think that this is a goal for this change.

> >
> > Same comment for get operation below
> >
> > > +	if (unlikely(cache_size == 0 || is_mp == 0 ||
> > > +		     lcore_id >= RTE_MAX_LCORE))
> > >  		goto ring_enqueue;
> > >
> > >  	/* Go straight to ring if put would overflow mem allocated for cache
> > > */
> > > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp,
> void
> > > **obj_table,
> > >  	uint32_t cache_size = mp->cache_size;
> > >
> > >  	/* cache is not enabled or single consumer */
> > > -	if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> > > +	if (unlikely(cache_size == 0 || is_mc == 0 ||
> > > +		     n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> > >  		goto ring_dequeue;
> > >
> > >  	cache = &mp->local_cache[lcore_id];
> > > --
> > > 1.8.1.4



More information about the dev mailing list