[RFC] ethdev: fast path async flow API

Ferruh Yigit ferruh.yigit at amd.com
Mon Jan 29 18:36:15 CET 2024
Previous message (by thread): [RFC] ethdev: fast path async flow API
Next message (by thread): [RFC] ethdev: fast path async flow API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 1/29/2024 1:38 PM, Dariusz Sosnowski wrote:
> Hi all,
> 
>> -----Original Message-----
>> From: Dariusz Sosnowski <dsosnowski at nvidia.com>
>> Sent: Tuesday, January 23, 2024 12:37
>> To: Stephen Hemminger <stephen at networkplumber.org>
>> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
>> Ferruh Yigit <ferruh.yigit at amd.com>; Andrew Rybchenko
>> <andrew.rybchenko at oktetlabs.ru>; Ori Kam <orika at nvidia.com>;
>> dev at dpdk.org
>> Subject: RE: [RFC] ethdev: fast path async flow API
>>
>> Hi Stephen,
>>
>> Sorry for such a late response.
>>
>> From: Stephen Hemminger <stephen at networkplumber.org>
>> Sent: Thursday, January 4, 2024 02:08
>>> On Wed, 3 Jan 2024 19:14:49 +0000
>>> Dariusz Sosnowski <dsosnowski at nvidia.com> wrote:
>>>> In summary, in my opinion extending the async flow API with bulking
>>> capabilities or exposing the queue directly to the application is not desirable.
>>>> This proposal aims to reduce the I-cache overhead in async flow API
>>>> by
>>> reusing the existing design pattern in DPDK - fast path functions are
>>> inlined to the application code and they call cached PMD callbacks.
>>>
>>> Inline needs to more discouraged in DPDK, because it only works if
>>> application ends up building with DPDK from source.
>>> It doesn't work for the Linux distro packaging model and symbol
>>> versioning, etc.
>>
>> I understand the problem. In that case, I have a proposal.
>> I had a chat with Thomas regarding this RFC, and he noticed that there are 2
>> separate changes proposed here:
>>
>> 1. Per-port callbacks for async flow API.
>>     - Moves specified callbacks to rte_flow_fp_ops struct and allow PMDs to
>> register them dynamically.
>>     - Removes indirection at API level - no need to call rte_flow_ops_get().
>>     - Removes checking if callbacks are defined - either the default DPDK callback
>> is used or the one provided by PMD.
>> 2. Make async flow API functions inlineable.
>>
>> Change (1) won't break ABI (existing callbacks in rte_flow_ops can be marked
>> as deprecated for now and phased out later) and in my opinion is less
>> controversial compared to change (2).
>>
>> I'll rerun the benchmarks for both changes separately to compare their
>> benefits in isolation.
>> This would allow us to decide if change (2) is worth the drawbacks it
>> introduces.
>>
>> What do you think?
> 
> I split the RFC into 2 parts:
> 
> 1. Introduce per-port callbacks:
>     - Introduce rte_flow_fp_ops struct - holds callbacks for fast path calls, for each port. PMD registers callbacks through rte_flow_fp_ops_register().
>     - Relevant rte_flow_async_* functions just pass arguments to fast path callbacks. Validation checks are done only if RTE_FLOW_DEBUG macro is defined.
>     - Biggest difference is the removal of callback access through rte_flow_get_ops().
> 2. Inline async flow API functions.
>     - Relevant rte_flow_async_* functions definitions are moved to rte_flow.h and made inlineable.
> 
> Here are the results of the benchmark:
> 
> |                       | Avg Insertion | Diff over baseline | Avg Deletion | Diff over baseline |
> |-----------------------|---------------|--------------------|--------------|--------------------|
> | baseline (v24.03-rc0) |     5855.4    |                    |    9026.8    |                    |
> | applied (1)           |     6384.2    |    +528.8 (+9%)    |    10054.2   |  +1027.4 (+11.4%)  |
> | applied (2)           |     6434.6    |   +579.2 (+9.9%)   |    10011.4   |   +984.6 (+10.9%)  |
> 
> Results are in KFlows/sec.
> The benchmark is continuously inserting and deleting 2M flow rules.
> These rules match on IPv4 destination address and with a single action DROP.
> Flow rules are inserted and deleted using a single flow queue.
> 
> Change (2) improves insertion rate performance by ~1% compared to (1), but decreases deletion rate by ~0.5%.
> Based on these results, I think we can say that making rte_flow_async_*() calls inlineable may not be worth it compared to the issues it causes.
> 
> What do you all think about the results?
> 

Hi Dariusz,

As discussed before, converting APIs to static inline functions or
exposing 'rte_eth_dev' has cons but with only applying first part above
seems get rid of them with reasonable performance gain, so I think we
can continue with this approach and continue to reviews.

Most of the 'rte_flow_async_*' are already missing the function validity
check, so having a default (dummy?) rte_flow_fp_ops for them seems even
an improvement there.


Only previously 'struct rte_flow_ops' was coming from drivers, ethdev
layer doesn't need to maintain anything.
But with 'rte_flow_fp_ops' struct, ethdev needs to store another array
with fixed ('RTE_MAX_ETHPORTS') size which will be another blocker in
the future to convert this fixed arrays to dynamically allocated arrays.

For this, does it still help we add an a new field to "struct
rte_eth_dev", like "struct rte_flow_ops *flow_ops", similar to 'dev_ops'?
The indirection still will be there, but eliminate 'rte_flow_get_ops()'
call and checks comes with it.
If makes sense, is there a chance to experiment this and get some
performance numbers with it?
Previous message (by thread): [RFC] ethdev: fast path async flow API
Next message (by thread): [RFC] ethdev: fast path async flow API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list