[dpdk-dev] [PATCH v2] ring: use aligned memzone allocation

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Jun 9 19:28:55 CEST 2017


-----Original Message-----
> Date: Fri, 9 Jun 2017 10:16:25 -0700
> From: Stephen Hemminger <stephen at networkplumber.org>
> To: Yerden Zhumabekov <e_zhumabekov at sts.kz>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev at intel.com>, "Richardson,
>  Bruce" <bruce.richardson at intel.com>, "Verkamp, Daniel"
>  <daniel.verkamp at intel.com>, "dev at dpdk.org" <dev at dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation
> 
> On Fri, 9 Jun 2017 18:47:43 +0600
> Yerden Zhumabekov <e_zhumabekov at sts.kz> wrote:
> 
> > On 06.06.2017 19:19, Ananyev, Konstantin wrote:
> > >  
> > >>>> Maybe there is some deeper  reason for the >= 128-byte alignment logic in rte_ring.h?  
> > >>> Might be, would be good to hear opinion the author of that change.  
> > >> It gives improved performance for core-2-core transfer.  
> > > You mean empty cache-line(s) after prod/cons, correct?
> > > That's ok but why we can't keep them and whole rte_ring aligned on cache-line boundaries?
> > > Something like that:
> > > struct rte_ring {
> > >     ...
> > >     struct rte_ring_headtail prod __rte_cache_aligned;
> > >     EMPTY_CACHE_LINE   __rte_cache_aligned;
> > >     struct rte_ring_headtail cons __rte_cache_aligned;
> > >     EMPTY_CACHE_LINE   __rte_cache_aligned;
> > > };
> > >
> > > Konstantin
> > >  
> > 
> > I'm curious, can anyone explain, how does it actually affect 
> > performance? Maybe we can utilize it application code?
> 
> I think it is because on Intel CPU's the CPU will speculatively fetch adjacent cache lines.
> If these cache lines change, then it will create false sharing.

I see. I think, In such cases it is better to abstract as conditional
compilation. The above logic has worst case cache memory
requirement if CPU is 128B CL and no speculative prefetch.


More information about the dev mailing list