[dpdk-dev] [PATCH v3 1/8] stack: introduce rte stack library

Eads, Gage gage.eads at intel.com
Mon Apr 1 21:34:32 CEST 2019


> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
> Sent: Monday, April 1, 2019 12:41 PM
> To: Eads, Gage <gage.eads at intel.com>; dev at dpdk.org
> Cc: olivier.matz at 6wind.com; arybchenko at solarflare.com; Richardson, Bruce
> <bruce.richardson at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu at arm.com>; nd <nd at arm.com>; thomas at monjalon.net; nd
> <nd at arm.com>
> Subject: RE: [PATCH v3 1/8] stack: introduce rte stack library
> 
> >
> > > > +static ssize_t
> > > > +rte_stack_get_memsize(unsigned int count) {
> > > > +	ssize_t sz = sizeof(struct rte_stack);
> > > > +
> > > > +	/* Add padding to avoid false sharing conflicts */
> > > > +	sz += RTE_CACHE_LINE_ROUNDUP(count * sizeof(void *)) +
> > > > +		2 * RTE_CACHE_LINE_SIZE;
> > > I did not understand how the false sharing is caused and how this
> > > padding is solving the issue. Verbose comments would help.
> >
> > The additional padding (beyond the CACHE_LINE_ROUNDUP) is to prevent
> > false sharing caused by adjacent/next-line hardware prefetchers. I'll
> > address this.
> >
> Is it not a generic problem? Or is it specific to this library?

This is not limited to this library, but it only affects systems with (enabled) next-line prefetchers, for example Intel products with an L2 adjacent cache line prefetcher[1]. For those systems, additional padding can potentially improve performance. As I understand it, this was the reason behind the 128B alignment added to rte_ring a couple years ago[2].

[1] https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors
[2] http://mails.dpdk.org/archives/dev/2017-February/058613.html


More information about the dev mailing list