[dpdk-dev] [PATCH 1/5] ring: allow rings with non power-of-2 sizes

Bruce Richardson bruce.richardson at intel.com
Fri Jun 30 15:59:03 CEST 2017


On Fri, Jun 30, 2017 at 02:24:38PM +0200, Olivier Matz wrote:
> On Fri, 30 Jun 2017 12:32:27 +0100, Bruce Richardson
> <bruce.richardson at intel.com> wrote:
> > On Fri, Jun 30, 2017 at 11:40:46AM +0200, Olivier Matz wrote:
> > > Hi Bruce,
> > > 
> > > Few comments inline.
> > > 
> > > On Wed,  7 Jun 2017 14:36:16 +0100, Bruce Richardson <bruce.richardson at intel.com> wrote:  
> > > > The rte_rings traditionally have only supported having ring sizes as powers
> > > > of 2, with the actual usable space being the size - 1. In some cases, for
> > > > example, with an eventdev where we want to precisely control queue depths
> > > > for latency, we need to allow ring sizes which are not powers of two so we
> > > > add in an additional ring capacity value to allow that. For existing rings,
> > > > this value will be size-1, i.e. the same as the mask, but if the new
> > > > EXACT_SZ flag is passed on ring creation, the ring will have exactly the
> > > > usable space requested, although the underlying memory size may be bigger.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>  
> > > 
> > > Specifying the exact wanted size looks to be a better API than the current
> > > one, which is to give the power of two above the wanted value, knowing there
> > > will be only size-1 elements available.
> > > 
> > > What would you think about deprecating ring init/creation without this
> > > flag in a few versions? Then, later, we could remove this flag and
> > > the new behavior would become the default.
> > >  
> > 
> > I'm not really sure it's necessary. For the vast majority of cases what
> > we have now is fine, and it ensures that we maximise the ring space. I'd
> > tend to keep the new behaviour for those cases where we really need it.
> > 
> > It's a trade off between what we hide and expose:
> > * current scheme hides the fact that you get one entry less than you ask
> >   for, but the memory space is as expected.
> > * new scheme gives you exactly the entries you ask for, but hides the
> >   fact that you could be using up to double the memory space for the
> >   ring.
> 
> Yes, having 2 different behavior is precisely what I don't really like.
> Giving the number of entries the user asks for is straightforward, which
> is a good thing for an API. And hiding the extra consumed memory looks
> acceptable, this can be documented.
> 
> That said, if we decide to deprecate it, there's no need to do it right
> now.
> 
> 
> > > > @@ -845,69 +857,63 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
> > > >  }
> > > >  
> > > >  /**
> > > > - * Test if a ring is full.
> > > > + * Return the number of entries in a ring.
> > > >   *
> > > >   * @param r
> > > >   *   A pointer to the ring structure.
> > > >   * @return
> > > > - *   - 1: The ring is full.
> > > > - *   - 0: The ring is not full.
> > > > + *   The number of entries in the ring.
> > > >   */
> > > > -static inline int
> > > > -rte_ring_full(const struct rte_ring *r)
> > > > +static inline unsigned
> > > > +rte_ring_count(const struct rte_ring *r)
> > > >  {
> > > >  	uint32_t prod_tail = r->prod.tail;
> > > >  	uint32_t cons_tail = r->cons.tail;
> > > > -	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
> > > > +	return (prod_tail - cons_tail) & r->mask;
> > > >  }  
> > > 
> > > When used on a mc/mp ring, this function can now return a value
> > > which is higher than the ring capacity. Even if this function is
> > > not really accurate when used while the ring is use, I think we
> > > should maximize the result with r->capacity.
> > > 
> > > This will also avoid rte_ring_free_count() to return a overflowed
> > > value.
> > >   
> > 
> > How does it return an overflowing value? I think in the MP/MC case the
> > tests made each time around the loop for cmpset should ensure we never
> > overflow.
> 
> Here we are reading r->prod.tail and r->cons.tail without
> synchronization, nor memory barriers. So basically, there is no
> guarantee that the values are consistent.
> 
> Before, the returned value could be wrong, but always in the
> interval [0, mask].
> 
> Now, rte_ring_count() is still in the interval [0, mask], but the
> capacity of the ring is lower than mask. If rte_ring_count() returns
> mask, it would cause rte_ring_free_count() to return a value lower than
> 0 (overflowed since the result is unsigned).
> 
> 
I'm still not convinced that this can actually happen, but just in case,
I'll add in the extra check in V2.

/Bruce


More information about the dev mailing list