[dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API

Ananyev, Konstantin konstantin.ananyev at intel.com
Fri Oct 11 16:41:01 CEST 2019



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
> Sent: Friday, October 11, 2019 6:04 AM
> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; stephen at networkplumber.org; paulmck at linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang at intel.com>; Medvedkin, Vladimir <vladimir.medvedkin at intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang at arm.com>; Dharmik Thakkar <Dharmik.Thakkar at arm.com>; dev at dpdk.org; nd <nd at arm.com>; nd
> <nd at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH v3 1/3] lib/ring: add peek API
> 
> >
> > > <snip>
> > >
> > > >
> > > > >
> > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > >
> > > > > > > From: Ruifeng Wang <ruifeng.wang at arm.com>
> > > > > > >
> > > > > > > The peek API allows fetching the next available object in the
> > > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > > dequeuing of objects depend on their value.
> > > > > > >
> > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar at arm.com>
> > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > <honnappa.nagarahalli at arm.com>
> > > > > > > Reviewed-by: Gavin Hu <gavin.hu at arm.com>
> > > > > > > ---
> > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > ++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 30 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > > *r, void
> > > > > > **obj_table,
> > > > > > >  				r->cons.single, available);  }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Peek one object from a ring.
> > > > > > > + *
> > > > > > > + * The peek API allows fetching the next available object in
> > > > > > > +the ring
> > > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > > +with respect
> > > > > > > + * to other consumer threads.
> > > > > > > + *
> > > > > > > + * @param r
> > > > > > > + *   A pointer to the ring structure.
> > > > > > > + * @param obj_p
> > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > + * @return
> > > > > > > + *   - 0: Success, object available
> > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > > +*r, void **obj_p)
> > > > > >
> > > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > > follow other rte_ring functions naming conventions
> > > > > > (rte_ring_sc_peek() or so).
> > > > > Agree
> > > > >
> > > > > >
> > > > > > As a better alternative what do you think about introducing a
> > > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > > Something like that:
> > > > > >
> > > > > > /* same as original ring dequeue, but:
> > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > >   * 2) don't update cons.tail
> > > > > >   */
> > > > > > unsigned int
> > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > **obj_table, unsigned int n,
> > > > > >                 unsigned int *available);
> > > > > >
> > > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > > num);
> > > > > >
> > > > > > /* resets cons.head to const.tail value */ void
> > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > >
> > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > >
> > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > > uintptr_t elt[nb_elt]; ...
> > > > > >
> > > > > > do {
> > > > > >
> > > > > >   /* read next elem from the queue */
> > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > >   if (n == 0)
> > > > > >       break;
> > > > > >
> > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > elt[0]) != 1) {
> > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > >      break;
> > > > > >   }
> > > > > >
> > > > > >   /* can reclaim, remove elem from the queue */
> > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > >
> > > > > >    /*call reclaim function */
> > > > > >   dr->f(dr->p, elt);
> > > > > >
> > > > > > } while (avl >= nb_elt);
> > > > > >
> > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > >
> > > > > I think it is a great idea. The other writers would still be
> > > > > polling for the current writer to update the tail or update the
> > > > > head. This makes it a
> > > > blocking solution.
> > > >
> > > > Yep, it is a blocking one.
> > > >
> > > > > We can make the other threads not poll i.e. they will quit
> > > > > reclaiming if they
> > > > see that other writers are dequeuing from the queue.
> > > >
> > > > Actually didn't think about that possibility, but yes should be
> > > > possible to have _try_ semantics too.
> > > >
> > > > >The other  way is to use per thread queues.
> > > > >
> > > > > The other requirement I see is to support unbounded-size data
> > > > > structures where in the data structures do not have a
> > > > > pre-determined number of entries. Also, currently the defer queue
> > > > > size is equal to the total
> > > > number of entries in a given data structure. There are plans to
> > > > support dynamically resizable defer queue. This means, memory
> > > > allocation which will affect the lock-free-ness of the solution.
> > > > >
> > > > > So, IMO:
> > > > > 1) The API should provide the capability to support different
> > > > > algorithms -
> > > > may be through some flags?
> > > > > 2) The requirements for the ring are pretty unique to the problem
> > > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > > the same, skip
> > > > polling). So, we should probably implement a ring with-in the RCU library?
> > > >
> > > > Personally, I think such serialization ring API would be useful for
> > > > other cases too.
> > > > There are few cases when user need to read contents of the queue
> > > > without removing elements from it.
> > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > transmit queue.
> > > > If such API would exist in DPDK we can just use it straightway,
> > > > without maintaining a separate one.
> > > ok
> > >
> > > >
> > > > >
> > > > > From the timeline perspective, adding all these capabilities would
> > > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > > satisfies my current needs. I suggest that we make provisions in
> > > > > APIs now to
> > > > support all these features, but do the implementation in the coming
> > releases.
> > > > Does this sound ok for you?
> > > >
> > > > Not sure I understand your suggestion here...
> > > > Could you explain it a bit more - how new API will look like and
> > > > what would be left for the future.
> > > For this patch, I suggest we do not add any more complexity. If
> > > someone wants a lock-free/block-free mechanism, it is available by creating
> > per thread defer queues.
> > >
> > > We push the following to the future:
> > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > lock-free/block-free reclamation will not be available (memory allocation
> > requires locking). The memory for the defer queue will be allocated/freed in
> > chunks of 'size' elements as the queue grows/shrinks.
> >
> > That one is fine by me.
> > In fact I don't know would be there a real use-case for dynamic defer queue
> > for rcu var...
> > But I suppose that's subject for another discussion.
> Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done
> regularly.
> If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

I understand the intention.
Though I am not very happy with approach where to free one resource we first have to allocate another one.
Sounds like a source of deadlocks and for that case probably unnecessary complication.
But again, as it is not for 19.11 we don't have to discuss it now.
 
> >
> > >
> > > 2) Constant size defer queue with lock-free and block-free reclamation
> > > (single option). The defer queue will be of fixed length 'size'. If
> > > the queue gets full an error is returned. The user could provide a 'size' equal
> > to the number of elements in a data structure to ensure queue never gets full.
> >
> > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > - MP/MC
> > - MP/SC
> > - SP/SC
> Just SP/SC

Ok, just to confirm we are on the same page:
there would be a possibility for one thread do dq_enqueue(), second one do  dq_reclaim() simultaneously
(of course if actual reclamation function is thread safe)?
 
> > - non MT at all (only same single thread can do enqueue and dequeue)
> If MT safe is required, one should use 1 defer queue per thread for now.
> 
> >
> > And related question:
> > What additional rte_ring API you plan to introduce in that case?
> > - None
> > - rte_ring_sc_peek()
> rte_ring_peek will be changed to rte_ring_sc_peek
> 
> > - rte_ring_serial_dequeue()
> >
> > >
> > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > > 2 #defines, one for dynamically variable size defer queue and the other for
> > constant size defer queue.
> > >
> > > However, IMO, using per thread defer queue is a much simpler way to
> > achieve 2. It does not add any significant burden to the user either.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > +	unsigned int n = 1;
> > > > > > > +	if (count) {
> > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > +		return 0;
> > > > > > > +	}
> > > > > > > +	return -ENOENT;
> > > > > > > +}
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.17.1



More information about the dev mailing list