[dpdk-dev] [RFC PATCH] eventdev: add buffered enqueue and flush APIs

Jerin Jacob jerin.jacob at caviumnetworks.com
Wed Dec 14 08:44:17 CET 2016


On Mon, Dec 12, 2016 at 05:56:32PM +0000, Eads, Gage wrote:
> 
> 
> >  -----Original Message-----
> >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  Sent: Wednesday, December 7, 2016 10:42 PM
> >  To: Eads, Gage <gage.eads at intel.com>
> >  1) What if the burst has ATOMIC flows and if we are NOT en-queuing to the
> >  implementation then other event ports won't get the packets from the same
> >  ATOMIC tag ? BAD. Right?
> 
> I'm not sure what scenario you're describing here. The buffered (as implemented in my patch) and non-buffered enqueue operations are functionally the same (as long as the buffer is flushed), the difference lies in when the events are moved from the application level to the PMD.

OK. I will try to explain with time-line

Assume,
flush size: 16
bust size: 4

At t0: dequeued 4 events(3 ordered and 1 atomic events)
At t1: after processing the events, store to the events local buffer
At t2: request to dequeue 4 more events

Now, Since scheduler has been scheduled an atomic event to port at t0, it
can not schedule an atomic event of same TAG to _any port_. As atomic
events from the same TAG's in-flight entry will be always one to enable
the critical section processing in the packet flow.

> 
> >  2) At least, In our HW implementation, The event buffer strategy is more like, if
> >  you enqueue to HW then ONLY you get the events from dequeue provided if op
> >  == RTE_EVENT_OP_FORWARD.So it will create deadlock.i.e application cannot
> >  hold the events with RTE_EVENT_OP_FORWARD
> 
> If I'm reading this correctly, you're concerned that buffered events can result in deadlock if they're not flushed. Whether the buffering is done in the app itself, inline in the API, or in the PMDs, not flushing the buffer is an application bug. E.g. the app could be fixed by flushing its enqueue buffer after processing every burst dequeued event set, or only if dequeue returns 0 events.

No. At least, our HW implementation, it maintains the state of scheduled events
to a port. Drivers get next set of events ONLY if driver submits
the events which already got on the first dequeue.i.e application cannot
hold the events with RTE_EVENT_OP_FORWARD

> 
> >  3) So considering the above case there is nothing like flush for us
> >  4) In real high throughput benchmark case, we will get the packets at the rate
> >  of max burst and then we always needs to memcpy before we flush.
> >  Otherwise there will be ordering issue as burst can get us the packet from
> >  different flows(unlike polling mode)
> 
> I take it you're referring to the memcpy in the patch, and not an additional memcpy? At any rate, I'm hoping that SIMD instructions can optimize the 16B event copy.

Hmm. The point was we need to memcpy all the time to maintain the order.

> 
> >  
> >  >
> >  > > and some does not need to hold the buffers if it is DDR backed.
> >  >
> >  
> >  See above. I am not against burst processing in "application".
> >  The flush does not make sense for us in HW perspective and it is costly for us if
> >  we trying generalize it.
> >  
> 
> Besides the data copy that buffering requires, are there additional costs from your perspective?

It won't even work in our case as HW maintains the context on dequeued
events.

I suggest checking the function call overhead. If it turned out
to have the impact on the performance.Then we can split flow based on capability
flag but I recommend it as last option.

> 
> >  >
> >  > I'm skeptical that other buffering strategies would emerge, but I can only
> >  speculate on Cavium/NXP/etc. NPU software.
> >  i>
> >  > > IHMO, This may not be the candidate for common code. I guess you can
> >  > > move this to driver side and abstract under SW driver's enqueue_burst.
> >  > >
> >  >
> >  > I don't think that will work without adding a flush API, otherwise we could
> >  have indefinitely buffered events. I see three ways forward:
> >  
> >  I agree. More portable way is to move the "flush" to the implementation and
> >  "flush"
> >  whenever it makes sense to PMD.
> >  
> >  >
> >  > - The proposed approach
> >  > - Add the proposed functions but make them implementation-specific.
> >  > - Require the application to write its own buffering logic (i.e. no
> >  > API change)
> >  
> >  I think, If the additional function call overhead cost is too much for SW
> >  implementation then we can think of implementation-specific API or custom
> >  application flow based on SW driver.
> >  
> >  But I am not fan of that(but tempted do now a days), If we take that route, we
> >  have truckload of custom implementation specific API and now we try to hide
> >  all black magic under enqueue/dequeue to make it portable at some expense.
> 
> Agreed, it's not worth special-casing the API with this relatively minor addition.
> 
> Thanks,
> Gage


More information about the dev mailing list