[dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure

Olivier Matz olivier.matz at 6wind.com
Tue Apr 3 17:56:01 CEST 2018


On Tue, Apr 03, 2018 at 09:07:04PM +0530, Jerin Jacob wrote:
> -----Original Message-----
> > Date: Tue, 3 Apr 2018 17:25:17 +0200
> > From: Olivier Matz <olivier.matz at 6wind.com>
> > To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > CC: dev at dpdk.org, konstantin.ananyev at intel.com, bruce.richardson at intel.com
> > Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> >  structure
> > User-Agent: NeoMutt/20170113 (1.7.2)
> > 
> > On Tue, Apr 03, 2018 at 08:37:23PM +0530, Jerin Jacob wrote:
> > > -----Original Message-----
> > > > Date: Tue, 3 Apr 2018 15:26:44 +0200
> > > > From: Olivier Matz <olivier.matz at 6wind.com>
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > > >  structure
> > > > X-Mailer: git-send-email 2.11.0
> > > > 
> > > > The initial objective of
> > > > commit d9f0d3a1ffd4 ("ring: remove split cacheline build setting")
> > > > was to add an empty cache line betwee, the producer and consumer
> > > > data (on platform with cache line size = 64B), preventing from
> > > > having them on adjacent cache lines.
> > > > 
> > > > Following discussion on the mailing list, it appears that this
> > > > also imposes an alignment constraint that is not required.
> > > > 
> > > > This patch removes the extra alignment constraint and adds the
> > > > empty cache lines using padding fields in the structure. The
> > > > size of rte_ring structure and the offset of the fields remain
> > > > the same on platforms with cache line size = 64B:
> > > > 
> > > >   rte_ring = 384
> > > >   rte_ring.name = 0
> > > >   rte_ring.flags = 32
> > > >   rte_ring.memzone = 40
> > > >   rte_ring.size = 48
> > > >   rte_ring.mask = 52
> > > >   rte_ring.prod = 128
> > > >   rte_ring.cons = 256
> > > > 
> > > > But it has an impact on platform where cache line size is 128B:
> > > > 
> > > >   rte_ring = 384        -> 768
> > > >   rte_ring.name = 0
> > > >   rte_ring.flags = 32
> > > >   rte_ring.memzone = 40
> > > >   rte_ring.size = 48
> > > >   rte_ring.mask = 52
> > > >   rte_ring.prod = 128   -> 256
> > > >   rte_ring.cons = 256   -> 512
> > > 
> > > Are we leaving TWO cacheline to make sure, HW prefetch don't load
> > > the adjust cacheline(consumer)?
> > > 
> > > If so, Will it have impact on those machine where it is 128B Cache line
> > > and the HW prefetcher is not loading the next caching explicitly. Right?
> > 
> > The impact on machines that have a 128B cache line is that an unused
> > cache line will be added between the producer and consumer data. I
> > expect that the impact is positive in case there is a hw prefetcher, and
> > null in case there is no such prefetcher.
> 
> It is not NULL, Right? You are loosing 256B for each ring.

Is it really that important?


> > On machines with 64B cache line, this was already the case. It just
> > reduces the alignment constraint.
> 
> Not all the 64B CL machines will have HW prefetch.
> 
> I would recommend to add conditional compilation flags to express HW
> prefetch enabled or not? based on that we can decide to reserve
> the additional space. By default, in common config, HW prefetch can
> be enabled so that it works for almost all cases.

The hw prefetcher can be enabled at runtime, so a compilation flag
does not seem to be a good idea. Moreover, changing this compilation
flag would change the ABI.


More information about the dev mailing list