[RFC] ethdev: fast path async flow API

Dariusz Sosnowski dsosnowski at nvidia.com
Wed Jan 3 20:14:49 CET 2024


> -----Original Message-----
> From: Stephen Hemminger <stephen at networkplumber.org>
> Sent: Thursday, December 28, 2023 18:17
> > However, at the moment I see one problem with this approach.
> > It would require DPDK to expose the rte_eth_dev struct definition,
> > because of implied locking implemented in the flow API.
> 
> This is a blocker, showstopper for me.
+1

> Have you considered having something like
>    rte_flow_create_bulk()
> 
> or better yet a Linux iouring style API?
> 
> A ring style API would allow for better mixed operations across the board and
> get rid of the I-cache overhead which is the root cause of the needing inline.
Existing async flow API is somewhat close to the io_uring interface.
The difference being that queue is not directly exposed to the application.
Application interacts with the queue using rte_flow_async_* APIs (e.g., places operations in the queue, pushes them to the HW).
Such design has some benefits over a flow API which exposes the queue to the user:
- Easier to use - Applications do not manage the queue directly, they do it through exposed APIs.
- Consistent with other DPDK APIs - In other libraries, queues are manipulated through API, not directly by an application.
- Lower memory usage - only HW primitives are needed (e.g., HW queue on PMD side), no need to allocate separate application queues.

Bulking of flow operations is a tricky subject.
Compared to packet processing, where it is desired to keep the manipulation of raw packet data to the minimum (e.g., only packet headers are accessed),
during flow rule creation all items and actions must be processed by PMD to create a flow rule.
The amount of memory consumed by items and actions themselves during this process might be nonnegligible.
If flow rule operations were bulked, the size of working set of memory would increase, which could have negative consequences on the cache behavior.
So, it might be the case that by utilizing bulking the I-cache overhead is removed, but the D-cache overhead is added.
On the other hand, creating flow rule operations (or enqueuing flow rule operations) one by one enables applications to reuse the same memory for different flow rules.

In summary, in my opinion extending the async flow API with bulking capabilities or exposing the queue directly to the application is not desirable.
This proposal aims to reduce the I-cache overhead in async flow API by reusing the existing design pattern in DPDK - fast path functions are inlined to the application code and they call cached PMD callbacks.

Best regards,
Dariusz Sosnowski


More information about the dev mailing list