[RFC] ethdev: fast path async flow API

Konstantin Ananyev konstantin.ananyev at huawei.com
Thu Jan 4 09:47:02 CET 2024



> > This is a blocker, showstopper for me.
> +1
> 
> > Have you considered having something like
> >    rte_flow_create_bulk()
> >
> > or better yet a Linux iouring style API?
> >
> > A ring style API would allow for better mixed operations across the board and
> > get rid of the I-cache overhead which is the root cause of the needing inline.
> Existing async flow API is somewhat close to the io_uring interface.
> The difference being that queue is not directly exposed to the application.
> Application interacts with the queue using rte_flow_async_* APIs (e.g., places operations in the queue, pushes them to the HW).
> Such design has some benefits over a flow API which exposes the queue to the user:
> - Easier to use - Applications do not manage the queue directly, they do it through exposed APIs.
> - Consistent with other DPDK APIs - In other libraries, queues are manipulated through API, not directly by an application.
> - Lower memory usage - only HW primitives are needed (e.g., HW queue on PMD side), no need to allocate separate application
> queues.
> 
> Bulking of flow operations is a tricky subject.
> Compared to packet processing, where it is desired to keep the manipulation of raw packet data to the minimum (e.g., only packet
> headers are accessed),
> during flow rule creation all items and actions must be processed by PMD to create a flow rule.
> The amount of memory consumed by items and actions themselves during this process might be nonnegligible.
> If flow rule operations were bulked, the size of working set of memory would increase, which could have negative consequences on
> the cache behavior.
> So, it might be the case that by utilizing bulking the I-cache overhead is removed, but the D-cache overhead is added.

Is rte_flow struct really that big?
We do bulk processing for mbufs, crypto_ops, etc., and usually bulk processing improves performance not degrades it.
Of course bulk size has to be somewhat reasonable.

> On the other hand, creating flow rule operations (or enqueuing flow rule operations) one by one enables applications to reuse the
> same memory for different flow rules.
> 
> In summary, in my opinion extending the async flow API with bulking capabilities or exposing the queue directly to the application is
> not desirable.
> This proposal aims to reduce the I-cache overhead in async flow API by reusing the existing design pattern in DPDK - fast path
> functions are inlined to the application code and they call cached PMD callbacks.
> 
> Best regards,
> Dariusz Sosnowski


More information about the dev mailing list